image from https://www.pexels.com/@yasinaydin/ |
In this blog, we look at Azure Form Recognizer Python SDK by creating a simple medical prescription and examining the results from Azure Form Recognizer.
First and foremost, we need to create the Azure Cognitive service.
and get the access key and endpoint.Python Implementation
We use its Python client library, azure-ai-formrecognizer
. And the code is
import json import os from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentAnalysisClient endpoint = os.environ["AZURE_FORM_REG_ENDPOINT"] access_key = os.environ["AZURE_FORM_REG_KEY"] client = DocumentAnalysisClient(endpoint, AzureKeyCredential(access_key)) with open("prescription_fake.pdf", "rb") as fd: document = fd.read() poller = client.begin_analyze_document("prebuilt-layout", document) fr_result = poller.result() results = { "pages": [], "tables": [], } for page in fr_result.pages: # there is only one page in this case results["pages"].append({ "words": [{"word": w.content, "confidence": w.confidence} for w in page.words] }) for table in fr_result.tables: # there is only one table in this case rows = [] row_index = max([cell.row_index for cell in table.cells if cell.content]) for r in range(row_index + 1): rows.append([]) for cell in table.cells: if cell.row_index <= row_index: rows[cell.row_index].append(cell.content) results["tables"].append(rows) print(json.dumps(results, indent=4))
Essentially, we open a PDF document. I made one up.
The Form Recognizer returns two sets of output.- words in each page
- cell content in each table.
We parse the result and got this.
{ "pages": [ { "words": [ { "word": "Medication", "confidence": 1.0 }, { "word": "Prescription", "confidence": 1.0 }, { "word": "Form", "confidence": 1.0 }, { "word": "Name:", "confidence": 1.0 }, { "word": "_", "confidence": 1.0 }, { "word": "Dennis", "confidence": 1.0 }, { "word": "Seah", "confidence": 1.0 }, { "word": "___________________", "confidence": 1.0 }, { "word": "Emergency", "confidence": 1.0 }, { "word": "Contact", "confidence": 1.0 }, { "word": "Name/Phone:", "confidence": 1.0 }, { "word": "(669)", "confidence": 1.0 }, { "word": "7654-321", "confidence": 1.0 }, { "word": "___________", "confidence": 1.0 }, { "word": "Date", "confidence": 1.0 }, { "word": "Last", "confidence": 1.0 }, { "word": "Updated:", "confidence": 1.0 }, { "word": "__", "confidence": 1.0 }, { "word": "Nov", "confidence": 1.0 }, { "word": "24,", "confidence": 1.0 }, { "word": "2022", "confidence": 1.0 }, { "word": "_______", "confidence": 1.0 }, { "word": "_________________", "confidence": 1.0 }, { "word": "_________________", "confidence": 1.0 }, { "word": "Prescription", "confidence": 1.0 }, { "word": "Medications:", "confidence": 1.0 }, { "word": "Name", "confidence": 1.0 }, { "word": "of", "confidence": 1.0 }, { "word": "Medication", "confidence": 1.0 }, { "word": "Strength", "confidence": 1.0 }, { "word": "and", "confidence": 1.0 }, { "word": "Frequency", "confidence": 1.0 }, { "word": "Condition", "confidence": 1.0 }, { "word": "Medication", "confidence": 1.0 }, { "word": "Taken", "confidence": 1.0 }, { "word": "For", "confidence": 1.0 }, { "word": "Physician", "confidence": 1.0 }, { "word": "who", "confidence": 1.0 }, { "word": "Prescribed", "confidence": 1.0 }, { "word": "Med", "confidence": 1.0 }, { "word": "Notes", "confidence": 1.0 }, { "word": "ibuprofen", "confidence": 1.0 }, { "word": "200mg", "confidence": 1.0 }, { "word": "per", "confidence": 1.0 }, { "word": "day", "confidence": 1.0 }, { "word": "Enalapril", "confidence": 1.0 }, { "word": "10mg", "confidence": 1.0 }, { "word": "2", "confidence": 1.0 }, { "word": "times", "confidence": 1.0 }, { "word": "a", "confidence": 1.0 }, { "word": "day", "confidence": 1.0 }, { "word": "Allergies", "confidence": 1.0 }, { "word": "Pharmacy/Prescription", "confidence": 1.0 }, { "word": "Drug", "confidence": 1.0 }, { "word": "Plan", "confidence": 1.0 }, { "word": "CVS", "confidence": 1.0 } ] } ], "tables": [ [ [ "Name of Medication", "Strength and Frequency", "Condition Medication Taken For", "Physician who Prescribed Med", "Notes" ], [ "ibuprofen", "200mg per day", "", "", "" ], [ "Enalapril", "10mg 2 times a day", "", "", "" ] ] ] }
Summary
The Python SDK is very easy to use. I have yet to test with many pages of PDF documents and check the response time.
Comments
Post a Comment