Text Analytics for health

Picture from Andrea pixabay (https://www.pexels.com/@pixabay/)

I was introduced to Text Analytics for Health which is part of Azure Cognitive Service for Language. And, I wanted to see how it works. So I create a small Python Program.

1. Initialize Setup

First and foremost, I need to do the following

Create a resource group
Create an Azure Cognitive Services Text Analytics in this resource group.
Copy the key and endpoint values as shown below.

For simplicity, I have two values as environment parameters, AZURE_LANG_KEY and AZURE_LANG_ENDPOINT.

2. Code

FYI, I am using MacOSX.

2.1 Install library

pip3 install azure-ai-textanalytics==5.2.0

2.2 Python Code

The Python code is very concise and self-explanatory.

import json
import sys
import os

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential


def execute(text):
    # export the key and endpoint in environment parameters,
    # AZURE_LANG_KEY and AZURE_LANG_ENDPOINT respectively
    key = os.environ.get("AZURE_LANG_KEY")
    endpoint = os.environ.get("AZURE_LANG_ENDPOINT")

    # get Text Analytics Client with the endpoint and key
    text_analytics_client = TextAnalyticsClient(
        endpoint=endpoint, credential=AzureKeyCredential(key)
    )

    def parse_result(entity):
        return {
            "text": entity.text,
            "normalized_text": entity.normalized_text,
            "category": entity.category,
            "subcategory": entity.subcategory,
            "confidence_score": entity.confidence_score,
        }

    def parse_results(results):
        return list(map(parse_result, results.entities))

    # send the text for analysis
    poller = text_analytics_client.begin_analyze_healthcare_entities([text])
    result = poller.result()

    return list(map(parse_results, [doc for doc in result if not doc.is_error]))


if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Missing text.")
    else:
        print(json.dumps(execute(sys.argv[1]), indent=2))

2.3 Test the code

2.3.1 Test1

python3 main.py "Patient needs to take 50 mg of ibuprofen."

outputs

[
  [
    {
      "text": "50 mg",
      "normalized_text": null,
      "category": "Dosage",
      "subcategory": null,
      "confidence_score": 1.0
    },
    {
      "text": "ibuprofen",
      "normalized_text": "ibuprofen",
      "category": "MedicationName",
      "subcategory": null,
      "confidence_score": 1.0
    }
  ]
]

The dosage and drug name are identified.

2.3.2 Test2

python3 main.py "Take 100 mg of Statins Atorvastatin a day."

outputs

[
  [
    {
      "text": "100 mg",
      "normalized_text": null,
      "category": "Dosage",
      "subcategory": null,
      "confidence_score": 0.99
    },
    {
      "text": "Statins",
      "normalized_text": "Hydroxymethylglutaryl-CoA Reductase Inhibitors",
      "category": "MedicationClass",
      "subcategory": null,
      "confidence_score": 0.98
    },
    {
      "text": "Atorvastatin",
      "normalized_text": "atorvastatin",
      "category": "MedicationName",
      "subcategory": null,
      "confidence_score": 0.98
    },
    {
      "text": "a day",
      "normalized_text": null,
      "category": "Frequency",
      "subcategory": null,
      "confidence_score": 0.97
    }
  ]
]

this time, Frequency is identified too.

4. Conclusion

As we can see that Python code is very concise. It can be in an Azure Function App or Azure Kubernetes Service (for more complex use cases). The results from this service appear to be good (disclaimer, I do not do extensive testing).

Dennis Seah

Search This Blog