Image from https://www.pexels.com/@artempodrez/ |
This blog is about using a Named Entity Recognition model, biomedical-ner-all to identify entities in a statement. We use hugging-face to load the inference API and execute it via a pipeline.
These are the dependencies. I am using Python 3.10 and MacOS.
pip3 install torch pip3 install transformers
and here is a simple code snippet
import json from transformers import pipeline from transformers import AutoTokenizer, AutoModelForTokenClassification model = "d4data/biomedical-ner-all" tokenizer = AutoTokenizer.from_pretrained(model) model = AutoModelForTokenClassification.from_pretrained(model) pipe = pipeline( "ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple" ) # not using gpu because Torch not compiled with CUDA enabled, otherwise we can # device=0 result = pipe("Patient took ibuprofen for 2 weeks with 200 mg per day.") for r in result: r["score"] = float(r["score"]) print(json.dumps(result, indent=2))
and there are the results
[ { "entity_group": "Medication", "score": 0.9998639822006226, "word": "ib", "start": 13, "end": 15 }, { "entity_group": "Medication", "score": 0.995373010635376, "word": "##uprofen", "start": 15, "end": 22 }, { "entity_group": "Duration", "score": 0.9976975917816162, "word": "2 weeks", "start": 27, "end": 34 }, { "entity_group": "Dosage", "score": 0.8205375671386719, "word": "200 mg", "start": 40, "end": 46 } ]
I am hoping it includes "per day" as entity_group: Frequency. Otherwise, it looks good.
There is a paper (with code in github) if you are interested,
Comments
Post a Comment