![]() |
| Image from https://www.pexels.com/@artempodrez/ |
This blog is about using a Named Entity Recognition model, biomedical-ner-all to identify entities in a statement. We use hugging-face to load the inference API and execute it via a pipeline.
These are the dependencies. I am using Python 3.10 and MacOS.
pip3 install torch pip3 install transformers
and here is a simple code snippet
import json
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification
model = "d4data/biomedical-ner-all"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForTokenClassification.from_pretrained(model)
pipe = pipeline(
"ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple"
)
# not using gpu because Torch not compiled with CUDA enabled, otherwise we can
# device=0
result = pipe("Patient took ibuprofen for 2 weeks with 200 mg per day.")
for r in result:
r["score"] = float(r["score"])
print(json.dumps(result, indent=2))
and there are the results
[
{
"entity_group": "Medication",
"score": 0.9998639822006226,
"word": "ib",
"start": 13,
"end": 15
},
{
"entity_group": "Medication",
"score": 0.995373010635376,
"word": "##uprofen",
"start": 15,
"end": 22
},
{
"entity_group": "Duration",
"score": 0.9976975917816162,
"word": "2 weeks",
"start": 27,
"end": 34
},
{
"entity_group": "Dosage",
"score": 0.8205375671386719,
"word": "200 mg",
"start": 40,
"end": 46
}
]I am hoping it includes "per day" as entity_group: Frequency. Otherwise, it looks good.
There is a paper (with code in github) if you are interested,

Comments
Post a Comment