|  | 
| Image from https://www.pexels.com/@pavel-danilyuk/ | 
One of the cool features of OpenAI version 2023-07-01 is its custom Functions feature. This feature is close to me because I am working on a project where we require OpenAI to generate JSON responses. We always want the generated JSON to be consistent, otherwise, we need to do some post-processing tasks to get the JSON object in the correct schema.
In the blog, we observe the completion with and without this new feature.
Development Setup
python -m venv .venv poetry init poetry shell poetry add openai poetry add python-dotenv poetry add pydantic
I am using Python version 3.11
Create .env file
OPENAI_API_TYPE="azure" OPENAI_API_BASE="<OpenAI endpoint>" OPENAI_API_KEY="<OpenAI secret>" OPENAI_API_VERSION="2023-07-01-preview" OPENAI_DEPLOYMENT_ID="<deployment identifier>"
Without OpenAI functions feature
We use two sample texts for experimentation. We want to get the name of patient, the prescribed drugs, the name of the doctor, and the prescription dates (if any).
Michael Doe was prescribed with Propofol and Alprazolam by Dr. Conrad Murray on June 25, 2009 and July 26, 2010
and
Anne Smith was prescribed with Propofol by Dr. Who
Here is the code, and output
import os
from dotenv import load_dotenv
text1 = """Michael Doe was prescribed with Propofol and Alprazolam by Dr.
 Conrad Murray on June 25, 2009 and July 26, 2010."""
text2 = "Anne Smith was prescribed with Propofol by Dr. Who"
def generate(text: str) -> str:
    prompt = f"""
    Please extract the following information from the given text and return it as a JSON object:
    patient_name
    drug_names
    doctor_name
    dates
    This is the body of text to extract the information from:
    {text}
    """  # noqa E501
    openai_response = openai.ChatCompletion.create(
        deployment_id=os.getenv("OPENAI_DEPLOYMENT_ID"),
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
    )
    return openai_response["choices"][0]["message"]["content"]
if __name__ == "__main__":
    load_dotenv()
    import openai
    print(generate(text1))
    print(generate(text2))Output
{
  "patient_name": "Michael Doe",
  "drug_names": [
    "Propofol",
    "Alprazolam"
  ],
  "doctor_name": "Dr. Conrad Murray",
  "dates": [
    "June 25, 2009",
    "July 26, 2010"
  ]
}
{
  "patient_name": "Anne Smith",
  "drug_names": "Propofol",
  "doctor_name": "Dr. Who",
  "dates": ""
}A few observations
- drug_namescan be- list[str]or- str
- datescan be- list[str]or- str
I want a pydantic model as
class PatientInfo(BaseModel):
    patient_name: str
    doctor_name: str
    drug_names: list[str] | None = None
    date: list[str] | None = NoneLet's use the OpenAI function feature to address this.
With the OpenAI function feature
import os
import json
from dotenv import load_dotenv
from pydantic import BaseModel
text1 = """Michael Doe was prescribed with Propofol and Alprazolam by Dr.
 Conrad Murray on June 25, 2009 and July 26, 2010."""
text2 = "Anne Smith was prescribed with Propofol by Dr. Who"
class PatientInfo(BaseModel):
    patient_name: str
    doctor_name: str
    drug_names: list[str] | None = None
    date: list[str] | None = None
patient_function = [
    {
        "name": "extract_patient_info",
        "description": "Get the drug prescription information from the text.",
        "parameters": {
            "type": "object",
            "properties": {
                "patient_name": {
                    "type": "string",
                    "description": "Name of the patient",
                },
                "drug_names": {
                    "type": "array",
                    "items": {
                        "type": "string",
                    },
                    "description": "The drug names.",
                },
                "doctor_name": {"type": "string", "description": "doctor name."},
                "dates": {
                    "type": "array",
                    "items": {
                        "type": "string",
                    },
                    "description": "one of the date of prescription.",
                },
            },
        },
    }
]
def generate(text: str) -> str:
    prompt = f"""
    Please extract the following information from the given text and return it as a JSON object:
    patient_name
    drug_names
    doctor_name
    dates
    This is the body of text to extract the information from:
    {text}
    """  # noqa E501
    openai_response = openai.ChatCompletion.create(
        deployment_id=os.getenv("OPENAI_DEPLOYMENT_ID"),
        messages=[{"role": "user", "content": prompt}],
        functions=patient_function,
        function_call="auto",
    )
    return openai_response["choices"][0]["message"]["function_call"]["arguments"]
if __name__ == "__main__":
    load_dotenv()
    import openai
    print(PatientInfo.model_validate(json.loads(generate(text1))))
    print(PatientInfo.model_validate(json.loads(generate(text2))))
The prompt is identical. we added the functions in the create function call. And, patient_function defines the JSON schema. Output is
patient_name='Michael Doe' doctor_name='Dr. Conrad Murray' drug_names=['Propofol', 'Alprazolam'] date=None patient_name='Anne Smith' doctor_name='Dr. Who' drug_names=['Propofol'] date=None
OpenAI has consistently generated the JSON that can be validated by the pydantic model.
This is a wonderful feature that can save us a lot of work.
Comments
Post a Comment