OpenAI: Functions Feature in 2023-07-01-preview API version

 

Image from https://www.pexels.com/@pavel-danilyuk/
Image from https://www.pexels.com/@pavel-danilyuk/

One of the cool features of OpenAI version 2023-07-01 is its custom Functions feature. This feature is close to me because I am working on a project where we require OpenAI to generate JSON responses. We always want the generated JSON to be consistent, otherwise, we need to do some post-processing tasks to get the JSON object in the correct schema.

In the blog, we observe the completion with and without this new feature.

Development Setup

python -m venv .venv
poetry init
poetry shell
poetry add openai
poetry add python-dotenv
poetry add pydantic

I am using Python version 3.11

Create .env file

OPENAI_API_TYPE="azure"
OPENAI_API_BASE="<OpenAI endpoint>"
OPENAI_API_KEY="<OpenAI secret>"
OPENAI_API_VERSION="2023-07-01-preview"
OPENAI_DEPLOYMENT_ID="<deployment identifier>"

Without OpenAI functions feature

We use two sample texts for experimentation. We want to get the name of patient, the prescribed drugs, the name of the doctor, and the prescription dates (if any).

Michael Doe was prescribed with Propofol and Alprazolam by Dr.
 Conrad Murray on June 25, 2009 and July 26, 2010

and

Anne Smith was prescribed with Propofol by Dr. Who

Here is the code, and output

import os
from dotenv import load_dotenv

text1 = """Michael Doe was prescribed with Propofol and Alprazolam by Dr.
 Conrad Murray on June 25, 2009 and July 26, 2010."""
text2 = "Anne Smith was prescribed with Propofol by Dr. Who"


def generate(text: str) -> str:
    prompt = f"""
    Please extract the following information from the given text and return it as a JSON object:

    patient_name
    drug_names
    doctor_name
    dates

    This is the body of text to extract the information from:
    {text}
    """  # noqa E501

    openai_response = openai.ChatCompletion.create(
        deployment_id=os.getenv("OPENAI_DEPLOYMENT_ID"),
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
    )
    return openai_response["choices"][0]["message"]["content"]


if __name__ == "__main__":
    load_dotenv()
    import openai

    print(generate(text1))
    print(generate(text2))

Output

{
  "patient_name": "Michael Doe",
  "drug_names": [
    "Propofol",
    "Alprazolam"
  ],
  "doctor_name": "Dr. Conrad Murray",
  "dates": [
    "June 25, 2009",
    "July 26, 2010"
  ]
}
{
  "patient_name": "Anne Smith",
  "drug_names": "Propofol",
  "doctor_name": "Dr. Who",
  "dates": ""
}

A few observations

  • drug_names can be list[str] or str
  • dates can be list[str] or str

I want a pydantic model as

class PatientInfo(BaseModel):
    patient_name: str
    doctor_name: str
    drug_names: list[str] | None = None
    date: list[str] | None = None

Let's use the OpenAI function feature to address this.

With the OpenAI function feature

import os
import json
from dotenv import load_dotenv
from pydantic import BaseModel


text1 = """Michael Doe was prescribed with Propofol and Alprazolam by Dr.
 Conrad Murray on June 25, 2009 and July 26, 2010."""
text2 = "Anne Smith was prescribed with Propofol by Dr. Who"


class PatientInfo(BaseModel):
    patient_name: str
    doctor_name: str
    drug_names: list[str] | None = None
    date: list[str] | None = None


patient_function = [
    {
        "name": "extract_patient_info",
        "description": "Get the drug prescription information from the text.",
        "parameters": {
            "type": "object",
            "properties": {
                "patient_name": {
                    "type": "string",
                    "description": "Name of the patient",
                },
                "drug_names": {
                    "type": "array",
                    "items": {
                        "type": "string",
                    },
                    "description": "The drug names.",
                },
                "doctor_name": {"type": "string", "description": "doctor name."},
                "dates": {
                    "type": "array",
                    "items": {
                        "type": "string",
                    },
                    "description": "one of the date of prescription.",
                },
            },
        },
    }
]


def generate(text: str) -> str:
    prompt = f"""
    Please extract the following information from the given text and return it as a JSON object:

    patient_name
    drug_names
    doctor_name
    dates

    This is the body of text to extract the information from:
    {text}
    """  # noqa E501

    openai_response = openai.ChatCompletion.create(
        deployment_id=os.getenv("OPENAI_DEPLOYMENT_ID"),
        messages=[{"role": "user", "content": prompt}],
        functions=patient_function,
        function_call="auto",
    )
    return openai_response["choices"][0]["message"]["function_call"]["arguments"]


if __name__ == "__main__":
    load_dotenv()
    import openai

    print(PatientInfo.model_validate(json.loads(generate(text1))))
    print(PatientInfo.model_validate(json.loads(generate(text2))))

The prompt is identical. we added the functions in the create function call. And, patient_function defines the JSON schema. Output is

patient_name='Michael Doe' doctor_name='Dr. Conrad Murray' drug_names=['Propofol', 'Alprazolam'] date=None
patient_name='Anne Smith' doctor_name='Dr. Who' drug_names=['Propofol'] date=None

OpenAI has consistently generated the JSON that can be validated by the pydantic model.

This is a wonderful feature that can save us a lot of work.


Comments