Loading...

Detecting PII exists in SharePoint List using AWS Comprehend

Detecting PII exists in SharePoint List using AWS Comprehend

Personally Identifiable Information (PII) is information that, when used alone or with other relevant data, can identify an individual. Sensitive personally identifiable information can include full name, SSN, driver’s license, financial information and medical records. As PII can be used to identify an individual, signify a major threat to companies. If breached, this information can lead to lawsuits and can damage company’s trustworthiness.

Amazon Comprehend is a natural-language processing (NLP) service that uses machine learning to uncover valuable insights and connections in text and documents. We can now use Amazon Comprehend to protect and control who has access to sensitive data by identifying and redacting Personally Identifiable Information (PII) from text and documents.

Also, we can redact documents(stored in an Amazon S3 bucket) by using Amazon Comprehend asynchronous analysis job. We can choose redaction mode Replace to mask PII entity with character and replace the characters in PII entities with a symbol(!, #, $, %, &, *, or @). Asynchronous PII redaction batch processing will be a great use case for SharePoint Document Libraries.

In this post, I showcased using Amazon Comprehend to detect PII entities from a specific SharePoint List column and record the results in another list and CSV report via the AWS CLI and CLI for Microsoft 365. Take a look at this script, give it a try, and please send me your feedback via the contact page.

Thank you, Sriharsha M S, for your valuable article on this topic.

Prerequisites

  • CLI for Microsoft 365
  • AWS CLI
  • SharePoint Online Lists(as shown in screenshots)
    • PII List: Items containing dummy/real PII data
    • PII Audit: To store the PII Audit audit detials
  • Necessary permission to access SharePoint Lists, Amazon Comprehend, and to write CSV files in local machine via

Source & Target Lists

pii-source-list.png

pii-audit-report-list-schema.png

PowerShell Script

$spolHostName = "https://spridermvp.sharepoint.com"
$spolSiteRelativeUrl = "/sites/dev"
$spolListToAuditTitle = "PII List"
$spolListToSaveAuditResponse = "PII Audit"
$spolListFields = "ID,Title,Content"
$resultDir = "Output"
$executionDir = $PSScriptRoot
$outputDir = "$executionDir/$resultDir"

if (-not (Test-Path -Path "$outputDir" -PathType Container)) {
    Write-Host "Creating $outputDir folder..." -ForegroundColor Yellow
    New-Item -ItemType Directory -Path "$outputDir"
}

$spolSiteUrl = "${spolHostName}${spolSiteRelativeUrl}"
$spolListItems = m365 spo listitem list --title $spolListToAuditTitle --webUrl $spolSiteUrl  --fields $spolListFields -o json | ConvertFrom-Json -AsHashtable

if ($spolListItems.Count -gt 0) {
    ForEach ($spolListItem in $spolListItems) {
        $spolListItemId = $spolListItem.Id
        $spolListItemContent = $spolListItem.Content
        Write-Host "Auditing Item Id: ${spolListItemId} in ${spolListToAuditTitle}" -ForegroundColor Green
        $response = aws comprehend detect-pii-entities --language-code en --text $spolListItemContent
        $auditResponse = $response | ConvertFrom-Json -AsHashtable 
        if ($auditResponse.Entities) {
            $auditEntitiesCount = $auditResponse.Entities.Count
            if ($auditEntitiesCount -gt 0) {
                Write-Host "Findings Count: ${auditEntitiesCount}" -ForegroundColor Magenta
                $piiFindings = @()
                ForEach ($piiEntity in $auditResponse.Entities) {
                    $piiFinding = New-Object -TypeName PSObject
                    $piiFinding | Add-Member -MemberType NoteProperty -Name "Score" -Value $piiEntity.Score
                    $piiFinding | Add-Member -MemberType NoteProperty -Name "Type" -Value $piiEntity.Type
                    $piiFinding | Add-Member -MemberType NoteProperty -Name "BeginOffset" -Value $piiEntity.BeginOffset
                    $piiFinding | Add-Member -MemberType NoteProperty -Name "EndOffset" -Value $piiEntity.EndOffset
                    $piiFindings += $piiFinding
                }
                $outputFilePath = "${outputDir}/$(get-date -f yyyyMMdd-HHmmss)-PIIFindings.csv"
                $piiFindings | Export-Csv -Path $outputFilePath -NoTypeInformation
                $auditEntry = m365 spo listitem add --contentType Item --listTitle $spolListToSaveAuditResponse --webUrl $spolSiteUrl --Title $spolListToAuditTitle --ItemID $spolListItemId --AuditCount $auditResponse.Entities.Count --AuditResult $outputFilePath -o json | ConvertFrom-Json -AsHashtable
                $auditEntryId = $auditEntry.Id
                Write-Host "Audit added for source Item Id: ${spolListItemId} with Item Id ${auditEntryId} in target list ${spolListToSaveAuditResponse}" -ForegroundColor Green
            }
            else {
                Write-Host "There are no findings in this item" -ForegroundColor Yellow
            }
        }
        else {
            Write-Host "There are no findings in this item" -ForegroundColor Yellow
        }
    }
}
else {
    Write-Host "No items in this list" -ForegroundColor Yellow
}

Output

pii-audit-report-list.png

pii-audit-report-data.png

Published on:

Learn more
Home | Joseph Velliah
Home | Joseph Velliah

Fulfilling God’s purpose for my life

Share post:

Related posts

Microsoft Purview | Data Loss Prevention: Decoupling policy tips and email notifications for SharePoint and OneDrive

Microsoft Purview is introducing a new feature allowing for the separate configuration of policy tips and user email notifications for SharePo...

2 months ago

Adding Metadata In SharePoint Using Columns

Get a better grip on your files and make it easier to organize and locate them using metadata in SharePoint. With this informative article, yo...

2 months ago

Announcing live connect for Power BI report integration with OneDrive and SharePoint (Preview)

If you use Power BI and OneDrive or SharePoint (ODSP), then this announcement is worth noting. Last May, Power BI enabled integration with ODS...

2 months ago

Enable Deletion of Non-Empty Folders in SharePoint for Microsoft 365 and OneDrive | Data Lifecycle Management

Microsoft Purview compliance portal now offers a feature that allows end users to delete non-empty document library folders covered under a Re...

2 months ago

Highlights: Viva Connections & SPFx JS SIG Community Call - May 2, 2024

This article provides a screenshot summary of the SharePoint PnP Viva Connections & SPFx JS SIG Call that was held on May 2nd, 2024. The call ...

2 months ago

Microsoft Graph API for SharePoint Pages | Now Generally Available

Developers can now take advantage of the general availability of the Microsoft Graph API for SharePoint pages. The API enables developers to i...

2 months ago

Microsoft SharePoint Advanced Tenant Rename Available Now | M365 Admin

For organizations that need to change their SharePoint domain name, the Advanced Tenant Rename feature is now available with SharePoint Advanc...

2 months ago

SharePoint Framework 1.19: Enterprise Extensibility Updates | Microsoft 365 Dev Blog

The SharePoint Framework (SPFx) has released a new version, v1.19, with updates and new features that enhance its usage within Microsoft 365, ...

2 months ago
Stay up to date with latest Microsoft Dynamics 365 and Power Platform news!
* Yes, I agree to the privacy policy