Azure Text Analytics Service


From https://www.pexels.com/photo/pile-of-covered-books-159751/
Image from https://www.pexels.com/photo/pile-of-covered-books-159751/

In this blog, we use recognize entities feature of Azure Text Analytics Service. On top of this, we also tied the entities back to the sentence in the original text.

Here is the git repo and you will find
  1. README.md to setup your Python development environment
  2. The code is written with dependency injection in mind, if you are using DI like lagom, you can easily port the service to your code base.
  3. infra/README.md to deploy the Azure Text Analytics Service (for your convenience)
  4. the data folder contains some sample text data
  5. run python -m main to see some results. You can change it to read from data folder too.
The core part main.py looks like this
async def main():
    svc = container.resolve(ITextExtractionService)
    results = await svc.recognize_entities(
        [
            "Studies have shown that regular physical activity is "
            "associated with a longer lifespan, reducing the risk "
            "of premature death from all causes.",
            "Climate change refers to significant, long-term changes in "
            "the average temperature, weather patterns, and atmospheric "
            "conditions on Earth. Although climate change is a natural "
            "phenomenon, the current trend of rapid warming is largely "
            "attributed to human activities. Understanding the causes, "
            "effects, and potential solutions for climate change is "
            "critical for mitigating its impacts on the environment "
            "and human society.",
        ]
    )
    # results = await svc.recognize_entities(fetch_content())
    with open("results.json", "w") as file:
        json.dump([result.model_dump() for result in results], file, indent=4)

There are two paragraphs and we want to extract named entities in them, and associate them with their  original statements. We use nltk to split the text into sentences.

Here is the results
[
    {
        "id": "0",
        "entities": [
            {
                "text": "Studies",
                "category": "Event",
                "subcategory": null,
                "length": 7,
                "offset": 0,
                "confidence_score": 0.51,
                "sentence": "Studies have shown that regular physical activity is
                 associated with a longer lifespan, reducing the risk of premature
                 death from all causes."
            },
            {
                "text": "physical activity",
                "category": "Skill",
                "subcategory": null,
                "length": 17,
                "offset": 32,
                "confidence_score": 0.73,
                "sentence": "Studies have shown that regular physical activity
                 is associated with a longer lifespan, reducing the risk of
                 premature death from all causes."
            },
            {
                "text": "premature",
                "category": "Event",
                "subcategory": null,
                "length": 9,
                "offset": 109,
                "confidence_score": 0.55,
                "sentence": "Studies have shown that regular physical activity
                 is associated with a longer lifespan, reducing the risk of
                premature death from all causes."
            },
            {
                "text": "death",
                "category": "Event",
                "subcategory": null,
                "length": 5,
                "offset": 119,
                "confidence_score": 0.48,
                "sentence": "Studies have shown that regular physical activity
                 is associated with a longer lifespan, reducing the risk of
                 premature death from all causes."
            }
        ]
    },
    {
        "id": "1",
        "entities": [
            {
                "text": "Earth",
                "category": "Location",
                "subcategory": null,
                "length": 5,
                "offset": 132,
                "confidence_score": 0.98,
                "sentence": "Climate change refers to significant, long-term
                 changes in the average temperature, weather patterns, and
                 atmospheric conditions on Earth."
            },
            {
                "text": "climate change",
                "category": "Event",
                "subcategory": null,
                "length": 14,
                "offset": 148,
                "confidence_score": 0.96,
                "sentence": "Although climate change is a natural phenomenon,
                 the current trend of rapid warming is largely attributed to
                 human activities."
            },
            {
                "text": "rapid warming",
                "category": "Event",
                "subcategory": null,
                "length": 13,
                "offset": 209,
                "confidence_score": 0.94,
                "sentence": "Although climate change is a natural phenomenon,
                 the current trend of rapid warming is largely attributed to
                 human activities."
            },
            {
                "text": "human",
                "category": "Skill",
                "subcategory": null,
                "length": 5,
                "offset": 248,
                "confidence_score": 0.65,
                "sentence": "Although climate change is a natural phenomenon,
                 the current trend of rapid warming is largely attributed to
                 human activities."
            },
            {
                "text": "climate change",
                "category": "Event",
                "subcategory": null,
                "length": 14,
                "offset": 329,
                "confidence_score": 0.98,
                "sentence": "Understanding the causes, effects, and potential
                 solutions for climate change is critical for mitigating its
                 impacts on the environment and human society."
            },
            {
                "text": "human",
                "category": "PersonType",
                "subcategory": null,
                "length": 5,
                "offset": 406,
                "confidence_score": 0.59,
                "sentence": "Understanding the causes, effects, and potential
                 solutions for climate change is critical for mitigating its
                 impacts on the environment and human society."
            }
        ]
    }
]

This may help us to mime information from corpus. Combining with Azure Speech to Text service, we can mime audio data too.






Comments