Bio Medical Named Entity Recognition Model

 

Image from https://www.pexels.com/@artempodrez/
Image from https://www.pexels.com/@artempodrez/

This blog is about using a Named Entity Recognition model, biomedical-ner-all to identify entities in a statement. We use hugging-face to load the inference API and execute it via a pipeline.

These are the dependencies. I am using Python 3.10 and MacOS.

pip3 install torch
pip3 install transformers

and here is a simple code snippet

import json
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification

model = "d4data/biomedical-ner-all"

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForTokenClassification.from_pretrained(model)

pipe = pipeline(
    "ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple"
)
# not using gpu because Torch not compiled with CUDA enabled, otherwise we can
# device=0

result = pipe("Patient took ibuprofen for 2 weeks with 200 mg per day.")

for r in result:
    r["score"] = float(r["score"])

print(json.dumps(result, indent=2))

and there are the results

[
  {
    "entity_group": "Medication",
    "score": 0.9998639822006226,
    "word": "ib",
    "start": 13,
    "end": 15
  },
  {
    "entity_group": "Medication",
    "score": 0.995373010635376,
    "word": "##uprofen",
    "start": 15,
    "end": 22
  },
  {
    "entity_group": "Duration",
    "score": 0.9976975917816162,
    "word": "2 weeks",
    "start": 27,
    "end": 34
  },
  {
    "entity_group": "Dosage",
    "score": 0.8205375671386719,
    "word": "200 mg",
    "start": 40,
    "end": 46
  }
]

I am hoping it includes "per day" as entity_group: Frequency. Otherwise, it looks good.

There is a paper (with code in github) if you are interested,



Comments