Image from https://www.pexels.com/photo/pile-of-covered-books-159751/ |
In this blog, we use recognize entities feature of Azure Text Analytics Service. On top of this, we also tied the entities back to the sentence in the original text.
Here is the git repo and you will find
README.md
to setup your Python development environment- The code is written with dependency injection in mind, if you are using DI like
lagom
, you can easily port the service to your code base. infra/README.md
to deploy the Azure Text Analytics Service (for your convenience)- the data folder contains some sample text data
- run
python -m main
to see some results. You can change it to read from data folder too.
The core part
main.py
looks like thisasync def main(): svc = container.resolve(ITextExtractionService) results = await svc.recognize_entities( [ "Studies have shown that regular physical activity is " "associated with a longer lifespan, reducing the risk " "of premature death from all causes.", "Climate change refers to significant, long-term changes in " "the average temperature, weather patterns, and atmospheric " "conditions on Earth. Although climate change is a natural " "phenomenon, the current trend of rapid warming is largely " "attributed to human activities. Understanding the causes, " "effects, and potential solutions for climate change is " "critical for mitigating its impacts on the environment " "and human society.", ] ) # results = await svc.recognize_entities(fetch_content()) with open("results.json", "w") as file: json.dump([result.model_dump() for result in results], file, indent=4)
There are two paragraphs and we want to extract named entities in them, and associate them with their original statements. We use
nltk
to split the text into sentences.Here is the results
[ { "id": "0", "entities": [ { "text": "Studies", "category": "Event", "subcategory": null, "length": 7, "offset": 0, "confidence_score": 0.51, "sentence": "Studies have shown that regular physical activity is associated with a longer lifespan, reducing the risk of premature death from all causes." }, { "text": "physical activity", "category": "Skill", "subcategory": null, "length": 17, "offset": 32, "confidence_score": 0.73, "sentence": "Studies have shown that regular physical activity is associated with a longer lifespan, reducing the risk of premature death from all causes." }, { "text": "premature", "category": "Event", "subcategory": null, "length": 9, "offset": 109, "confidence_score": 0.55, "sentence": "Studies have shown that regular physical activity is associated with a longer lifespan, reducing the risk of premature death from all causes." }, { "text": "death", "category": "Event", "subcategory": null, "length": 5, "offset": 119, "confidence_score": 0.48, "sentence": "Studies have shown that regular physical activity is associated with a longer lifespan, reducing the risk of premature death from all causes." } ] }, { "id": "1", "entities": [ { "text": "Earth", "category": "Location", "subcategory": null, "length": 5, "offset": 132, "confidence_score": 0.98, "sentence": "Climate change refers to significant, long-term changes in the average temperature, weather patterns, and atmospheric conditions on Earth." }, { "text": "climate change", "category": "Event", "subcategory": null, "length": 14, "offset": 148, "confidence_score": 0.96, "sentence": "Although climate change is a natural phenomenon, the current trend of rapid warming is largely attributed to human activities." }, { "text": "rapid warming", "category": "Event", "subcategory": null, "length": 13, "offset": 209, "confidence_score": 0.94, "sentence": "Although climate change is a natural phenomenon, the current trend of rapid warming is largely attributed to human activities." }, { "text": "human", "category": "Skill", "subcategory": null, "length": 5, "offset": 248, "confidence_score": 0.65, "sentence": "Although climate change is a natural phenomenon, the current trend of rapid warming is largely attributed to human activities." }, { "text": "climate change", "category": "Event", "subcategory": null, "length": 14, "offset": 329, "confidence_score": 0.98, "sentence": "Understanding the causes, effects, and potential solutions for climate change is critical for mitigating its impacts on the environment and human society." }, { "text": "human", "category": "PersonType", "subcategory": null, "length": 5, "offset": 406, "confidence_score": 0.59, "sentence": "Understanding the causes, effects, and potential solutions for climate change is critical for mitigating its impacts on the environment and human society." } ] } ]
This may help us to mime information from corpus. Combining with Azure Speech to Text service, we can mime audio data too.
Comments
Post a Comment