|  | 
| Image from https://www.pexels.com/@artempodrez/ | 
This blog is about using a Named Entity Recognition model, biomedical-ner-all to identify entities in a statement. We use hugging-face to load the inference API and execute it via a pipeline.
These are the dependencies. I am using Python 3.10 and MacOS.
pip3 install torch pip3 install transformers
and here is a simple code snippet
import json
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification
model = "d4data/biomedical-ner-all"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForTokenClassification.from_pretrained(model)
pipe = pipeline(
    "ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple"
)
# not using gpu because Torch not compiled with CUDA enabled, otherwise we can
# device=0
result = pipe("Patient took ibuprofen for 2 weeks with 200 mg per day.")
for r in result:
    r["score"] = float(r["score"])
print(json.dumps(result, indent=2))
and there are the results
[
  {
    "entity_group": "Medication",
    "score": 0.9998639822006226,
    "word": "ib",
    "start": 13,
    "end": 15
  },
  {
    "entity_group": "Medication",
    "score": 0.995373010635376,
    "word": "##uprofen",
    "start": 15,
    "end": 22
  },
  {
    "entity_group": "Duration",
    "score": 0.9976975917816162,
    "word": "2 weeks",
    "start": 27,
    "end": 34
  },
  {
    "entity_group": "Dosage",
    "score": 0.8205375671386719,
    "word": "200 mg",
    "start": 40,
    "end": 46
  }
]I am hoping it includes "per day" as entity_group: Frequency. Otherwise, it looks good.
There is a paper (with code in github) if you are interested,
Comments
Post a Comment