Azure Form Recognizer Python SDK

image from https://www.pexels.com/@yasinaydin/
image from https://www.pexels.com/@yasinaydin/

In this blog, we look at Azure Form Recognizer Python SDK by creating a simple medical prescription and examining the results from Azure Form Recognizer.

First and foremost, we need to create the Azure Cognitive service.

and get the access key and endpoint.


Python Implementation

We use its Python client library, azure-ai-formrecognizer. And the code is

import json
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer import DocumentAnalysisClient

endpoint = os.environ["AZURE_FORM_REG_ENDPOINT"]
access_key = os.environ["AZURE_FORM_REG_KEY"]
client = DocumentAnalysisClient(endpoint, AzureKeyCredential(access_key))

with open("prescription_fake.pdf", "rb") as fd:
    document = fd.read()

    poller = client.begin_analyze_document("prebuilt-layout", document)
    fr_result = poller.result()
    results = {
        "pages": [],
        "tables": [],
    }

    for page in fr_result.pages:  # there is only one page in this case
        results["pages"].append({
            "words": [{"word": w.content, "confidence": w.confidence} for w in page.words]
        })

    for table in fr_result.tables: # there is only one table in this case
        rows = []
        
        row_index = max([cell.row_index for cell in table.cells if cell.content])
        for r in range(row_index + 1):
            rows.append([])

        for cell in table.cells:
            if cell.row_index <= row_index:
                rows[cell.row_index].append(cell.content)

        results["tables"].append(rows)

    print(json.dumps(results, indent=4))

Essentially, we open a PDF document. I made one up.

The Form Recognizer returns two sets of output. 

  1. words in each page
  2. cell content in each table.
We parse the result and got this.
{
    "pages": [
        {
            "words": [
                {
                    "word": "Medication",
                    "confidence": 1.0
                },
                {
                    "word": "Prescription",
                    "confidence": 1.0
                },
                {
                    "word": "Form",
                    "confidence": 1.0
                },
                {
                    "word": "Name:",
                    "confidence": 1.0
                },
                {
                    "word": "_",
                    "confidence": 1.0
                },
                {
                    "word": "Dennis",
                    "confidence": 1.0
                },
                {
                    "word": "Seah",
                    "confidence": 1.0
                },
                {
                    "word": "___________________",
                    "confidence": 1.0
                },
                {
                    "word": "Emergency",
                    "confidence": 1.0
                },
                {
                    "word": "Contact",
                    "confidence": 1.0
                },
                {
                    "word": "Name/Phone:",
                    "confidence": 1.0
                },
                {
                    "word": "(669)",
                    "confidence": 1.0
                },
                {
                    "word": "7654-321",
                    "confidence": 1.0
                },
                {
                    "word": "___________",
                    "confidence": 1.0
                },
                {
                    "word": "Date",
                    "confidence": 1.0
                },
                {
                    "word": "Last",
                    "confidence": 1.0
                },
                {
                    "word": "Updated:",
                    "confidence": 1.0
                },
                {
                    "word": "__",
                    "confidence": 1.0
                },
                {
                    "word": "Nov",
                    "confidence": 1.0
                },
                {
                    "word": "24,",
                    "confidence": 1.0
                },
                {
                    "word": "2022",
                    "confidence": 1.0
                },
                {
                    "word": "_______",
                    "confidence": 1.0
                },
                {
                    "word": "_________________",
                    "confidence": 1.0
                },
                {
                    "word": "_________________",
                    "confidence": 1.0
                },
                {
                    "word": "Prescription",
                    "confidence": 1.0
                },
                {
                    "word": "Medications:",
                    "confidence": 1.0
                },
                {
                    "word": "Name",
                    "confidence": 1.0
                },
                {
                    "word": "of",
                    "confidence": 1.0
                },
                {
                    "word": "Medication",
                    "confidence": 1.0
                },
                {
                    "word": "Strength",
                    "confidence": 1.0
                },
                {
                    "word": "and",
                    "confidence": 1.0
                },
                {
                    "word": "Frequency",
                    "confidence": 1.0
                },
                {
                    "word": "Condition",
                    "confidence": 1.0
                },
                {
                    "word": "Medication",
                    "confidence": 1.0
                },
                {
                    "word": "Taken",
                    "confidence": 1.0
                },
                {
                    "word": "For",
                    "confidence": 1.0
                },
                {
                    "word": "Physician",
                    "confidence": 1.0
                },
                {
                    "word": "who",
                    "confidence": 1.0
                },
                {
                    "word": "Prescribed",
                    "confidence": 1.0
                },
                {
                    "word": "Med",
                    "confidence": 1.0
                },
                {
                    "word": "Notes",
                    "confidence": 1.0
                },
                {
                    "word": "ibuprofen",
                    "confidence": 1.0
                },
                {
                    "word": "200mg",
                    "confidence": 1.0
                },
                {
                    "word": "per",
                    "confidence": 1.0
                },
                {
                    "word": "day",
                    "confidence": 1.0
                },
                {
                    "word": "Enalapril",
                    "confidence": 1.0
                },
                {
                    "word": "10mg",
                    "confidence": 1.0
                },
                {
                    "word": "2",
                    "confidence": 1.0
                },
                {
                    "word": "times",
                    "confidence": 1.0
                },
                {
                    "word": "a",
                    "confidence": 1.0
                },
                {
                    "word": "day",
                    "confidence": 1.0
                },
                {
                    "word": "Allergies",
                    "confidence": 1.0
                },
                {
                    "word": "Pharmacy/Prescription",
                    "confidence": 1.0
                },
                {
                    "word": "Drug",
                    "confidence": 1.0
                },
                {
                    "word": "Plan",
                    "confidence": 1.0
                },
                {
                    "word": "CVS",
                    "confidence": 1.0
                }
            ]
        }
    ],
    "tables": [
        [
            [
                "Name of Medication",
                "Strength and Frequency",
                "Condition Medication Taken For",
                "Physician who Prescribed Med",
                "Notes"
            ],
            [
                "ibuprofen",
                "200mg per day",
                "",
                "",
                ""
            ],
            [
                "Enalapril",
                "10mg 2 times a day",
                "",
                "",
                ""
            ]
        ]
    ]
}

Summary

The Python SDK is very easy to use. I have yet to test with many pages of PDF documents and check the response time.
 



Comments

Popular posts from this blog

OpenAI: Functions Feature in 2023-07-01-preview API version

Storing embedding in Azure Database for PostgreSQL

Happy New Year, 2024 from DALL-E