Image from https://www.pexels.com/@pavel-danilyuk/ |
One of the cool features of OpenAI version 2023-07-01 is its custom Functions feature. This feature is close to me because I am working on a project where we require OpenAI to generate JSON responses. We always want the generated JSON to be consistent, otherwise, we need to do some post-processing tasks to get the JSON object in the correct schema.
In the blog, we observe the completion with and without this new feature.
Development Setup
python -m venv .venv poetry init poetry shell poetry add openai poetry add python-dotenv poetry add pydantic
I am using Python version 3.11
Create .env file
OPENAI_API_TYPE="azure" OPENAI_API_BASE="<OpenAI endpoint>" OPENAI_API_KEY="<OpenAI secret>" OPENAI_API_VERSION="2023-07-01-preview" OPENAI_DEPLOYMENT_ID="<deployment identifier>"
Without OpenAI functions feature
We use two sample texts for experimentation. We want to get the name of patient, the prescribed drugs, the name of the doctor, and the prescription dates (if any).
Michael Doe was prescribed with Propofol and Alprazolam by Dr. Conrad Murray on June 25, 2009 and July 26, 2010
and
Anne Smith was prescribed with Propofol by Dr. Who
Here is the code, and output
import os from dotenv import load_dotenv text1 = """Michael Doe was prescribed with Propofol and Alprazolam by Dr. Conrad Murray on June 25, 2009 and July 26, 2010.""" text2 = "Anne Smith was prescribed with Propofol by Dr. Who" def generate(text: str) -> str: prompt = f""" Please extract the following information from the given text and return it as a JSON object: patient_name drug_names doctor_name dates This is the body of text to extract the information from: {text} """ # noqa E501 openai_response = openai.ChatCompletion.create( deployment_id=os.getenv("OPENAI_DEPLOYMENT_ID"), messages=[{"role": "user", "content": prompt}], temperature=0, ) return openai_response["choices"][0]["message"]["content"] if __name__ == "__main__": load_dotenv() import openai print(generate(text1)) print(generate(text2))
Output
{ "patient_name": "Michael Doe", "drug_names": [ "Propofol", "Alprazolam" ], "doctor_name": "Dr. Conrad Murray", "dates": [ "June 25, 2009", "July 26, 2010" ] } { "patient_name": "Anne Smith", "drug_names": "Propofol", "doctor_name": "Dr. Who", "dates": "" }
A few observations
drug_names
can belist[str]
orstr
dates
can belist[str]
orstr
I want a pydantic
model as
class PatientInfo(BaseModel): patient_name: str doctor_name: str drug_names: list[str] | None = None date: list[str] | None = None
Let's use the OpenAI function feature to address this.
With the OpenAI function feature
import os import json from dotenv import load_dotenv from pydantic import BaseModel text1 = """Michael Doe was prescribed with Propofol and Alprazolam by Dr. Conrad Murray on June 25, 2009 and July 26, 2010.""" text2 = "Anne Smith was prescribed with Propofol by Dr. Who" class PatientInfo(BaseModel): patient_name: str doctor_name: str drug_names: list[str] | None = None date: list[str] | None = None patient_function = [ { "name": "extract_patient_info", "description": "Get the drug prescription information from the text.", "parameters": { "type": "object", "properties": { "patient_name": { "type": "string", "description": "Name of the patient", }, "drug_names": { "type": "array", "items": { "type": "string", }, "description": "The drug names.", }, "doctor_name": {"type": "string", "description": "doctor name."}, "dates": { "type": "array", "items": { "type": "string", }, "description": "one of the date of prescription.", }, }, }, } ] def generate(text: str) -> str: prompt = f""" Please extract the following information from the given text and return it as a JSON object: patient_name drug_names doctor_name dates This is the body of text to extract the information from: {text} """ # noqa E501 openai_response = openai.ChatCompletion.create( deployment_id=os.getenv("OPENAI_DEPLOYMENT_ID"), messages=[{"role": "user", "content": prompt}], functions=patient_function, function_call="auto", ) return openai_response["choices"][0]["message"]["function_call"]["arguments"] if __name__ == "__main__": load_dotenv() import openai print(PatientInfo.model_validate(json.loads(generate(text1)))) print(PatientInfo.model_validate(json.loads(generate(text2))))
The prompt is identical. we added the functions
in the create
function call. And, patient_function
defines the JSON schema. Output is
patient_name='Michael Doe' doctor_name='Dr. Conrad Murray' drug_names=['Propofol', 'Alprazolam'] date=None patient_name='Anne Smith' doctor_name='Dr. Who' drug_names=['Propofol'] date=None
OpenAI has consistently generated the JSON that can be validated by the pydantic
model.
This is a wonderful feature that can save us a lot of work.
Comments
Post a Comment