Bing Search as tool for OpenAI

 

Image from https://www.pexels.com/@skitterphoto/
Image from https://www.pexels.com/@skitterphoto/

Many times, we need to search for information in the web when we are working with OpenAI. One of the reasons is that the dataset for training for OpenAI is never up to date. For instance, the OpenAI GPT-4 Turbo has knowledge of events up to April 2023.

Here is the Python implementation with Pydantic model.

Dependencies

python = "^3.10"
pydantic-settings = "^2.1.0"
pydantic = "^2.5.2"
azure-cognitiveservices-search-websearch = "^2.0.0"

Environment

These are the environment parameters needed. Have the following in a .env file.

azure_bing_search_endpoint="https://api.bing.microsoft.com"
azure_bing_search_key="<key>"

We look under "Keys and Endpoint" for these values.

Pydantic Setting Model

from pydantic_settings import BaseSettings


class BingSearchSettings(BaseSettings):
    azure_bing_search_endpoint: str
    azure_bing_search_key: str

    class Config:
        env_file = ".env"
        extra = "ignore"


Pydantic Models

I use Pydantic most of the time, so I created a set of Pydantic models for Search results. This is optional because we only care about the snippets of search results.

from pydantic import BaseModel


class CognitiveSearchResponseQueryContext(BaseModel):
    original_query: str
    ask_user_for_location: bool | None = None


class CognitiveSearchWebImageObject(BaseModel):
    thumbnail_url: str
    width: int
    height: int


class CognitiveSearchWebPage(BaseModel):
    name: str
    url: str
    id: str | None = None
    thumbnail_url: str | None = None
    display_url: str | None = None
    snippet: str | None = None
    date_last_crawled: str | None = None
    deep_links: list["CognitiveSearchWebPage"] = []
    primary_image_of_page: CognitiveSearchWebImageObject | None = None


class CognitiveSearchImageThumbnail(BaseModel):
    width: int
    height: int


class CognitiveSearchImageObject(BaseModel):
    web_search_url: str
    name: str
    thumbnail_url: str
    content_url: str
    host_page_url: str
    width: int
    height: int
    thumbnail: CognitiveSearchImageThumbnail


class CognitiveSearchImages(BaseModel):
    id: str
    web_search_url: str
    is_family_friendly: bool
    value: list[CognitiveSearchImageObject]


class CognitiveSearchWebAnswer(BaseModel):
    web_search_url: str
    total_estimated_matches: int
    value: list[CognitiveSearchWebPage]


class CognitiveSearchRelatedSearchAnswer(BaseModel):
    text: str
    display_text: str
    web_search_url: str


class CognitiveSearchRelatedSearchAnswers(BaseModel):
    id: str
    value: list[CognitiveSearchRelatedSearchAnswer]


class CognitiveSearchVideo(BaseModel):
    web_search_url: str
    name: str
    description: str
    thumbnail_url: str
    content_url: str
    host_page_url: str
    width: int
    height: int
    motion_thumbnail_url: str
    embed_html: str
    allow_https_embed: bool
    view_count: int
    thumbnail: CognitiveSearchImageThumbnail
    allow_mobile_embed: bool
    is_superfresh: bool


class CognitiveSearchVideos(BaseModel):
    id: str
    web_search_url: str
    is_family_friendly: bool
    value: list[CognitiveSearchVideo]


class CognitiveSearchRankingResponseMainLineItemValue(BaseModel):
    id: str


class CognitiveSearchRankingResponseMainLineItem(BaseModel):
    answer_type: str
    result_index: int | None = None
    value: CognitiveSearchRankingResponseMainLineItemValue


class CognitiveSearchRankingResponseMainLine(BaseModel):
    items: list[CognitiveSearchRankingResponseMainLineItem]


class CognitiveSearchRankingResponseSidebarItemValue(BaseModel):
    id: str


class CognitiveSearchRankingResponseSidebarItem(BaseModel):
    answer_type: str
    result_index: int | None = None
    value: CognitiveSearchRankingResponseSidebarItemValue | None = None


class CognitiveSearchRankingResponseSidebar(BaseModel):
    items: list[CognitiveSearchRankingResponseSidebarItem]


class CognitiveSearchRankingResponse(BaseModel):
    mainline: CognitiveSearchRankingResponseMainLine
    sidebar: CognitiveSearchRankingResponseSidebar


class CognitiveSearchResponse(BaseModel):
    query_context: CognitiveSearchResponseQueryContext
    web_pages: CognitiveSearchWebAnswer
    related_searches: CognitiveSearchRelatedSearchAnswers
    images: CognitiveSearchImages | None = None
    videos: CognitiveSearchVideos | None = None
    ranking_response: CognitiveSearchRankingResponse

Search Tool

Next, we created a search tool.
from azure.cognitiveservices.search.websearch import WebSearchClient
from msrest.authentication import CognitiveServicesCredentials

from common.settings import BingSearchSettings
from models.cognitive_search import BingSearchResponse


def get_client(settings: BingSearchSettings) -> WebSearchClient:
    """Get Azure Cognitive Search client.

    :param settings: Azure Cognitive Search settings.
    """
    client = WebSearchClient(
        endpoint=settings.azure_bing_search_endpoint,
        credentials=CognitiveServicesCredentials(settings.azure_bing_search_key),
    )
    client.config.base_url = "{Endpoint}/v7.0"  # workaround for a bug
    return client


def search(settings: BingSearchSettings, query: str) -> str:
    """Search Azure Cognitive Search.

    :param settings: Azure Cognitive Search settings.
    :param query: Query string.
    :return: Search results.
    """
    web_data = get_client(settings=settings).web.search(
        query=query, text_decorations=True, text_format="HTML"
    )
    results = BingSearchResponse(**web_data.as_dict())  # type: ignore
    if not results.web_pages or not results.web_pages.value:
        return ""

    # concatenate snippets into a single string
    return " ".join([v.snippet for v in results.web_pages.value if v.snippet])

Test

To test it out, we can do
settings = BingSearchSettings.model_validate({})
results = search(settings=settings, query="Python")
print(results)

and we got
<b>Python</b> is a versatile and powerful language that lets you work quickly
and integrate systems more effectively. Learn how to get started, download
the latest version, access documentation, find jobs, and discover success
stories and events related to<b> Python.</b> This tutorial introduces the
reader informally to the basic concepts and features of the <b>Python</b>
language and system. It helps to have a <b>Python</b> interpreter handy for
hands-on experience, but all examples are self-contained, so the tutorial can
be read off-line as well. For a description of standard objects and modules,
see The <b>Python</b> Standard ... W3Schools offers free online tutorials,
references and exercises in<b> Python,</b> a popular programming language
that can be used for web applications, data analysis, automation and more.
Learn<b> Python</b> by examples, try it yourself, test your skills with quizzes
and exercises, and get certified by W3Schools. Learn the basics of<b> Python,</b>
a popular and easy-to-use programming language, from installing it to using it
for various purposes. Find out how to access online documentation, tutorials,
books, code samples, and more resources to help you get started with<b> Python.</b>
<b>Python</b> is a high-level, general-purpose programming language. Its
design philosophy emphasizes code readability with the use of significant
indentation. <b>Python</b> is dynamically typed and garbage-collected. It
supports multiple programming paradigms, including structured (particularly
procedural), object-oriented and functional programming. Learn the basics of
<b> Python,</b> a popular programming language that can be used as a calculator,
a text editor, and a calculator. This tutorial covers numbers, text, expressions,
variables, and more with examples and comments. A tutorial for beginners to
learn the basics of<b> Python,</b> a high-level, interpreted, interactive, and
object-oriented programming language. Learn what<b> Python</b> is, why you
should use it, how to download and install it, how to run your interpreter,
and how to handle errors, get help, and code style.

Note that we have text_decoration on in the function call to Bing Search and hence we got Python in the result.
    web_data = get_client(settings=settings).web.search(
        query=query, text_decorations=True, text_format="HTML"
    )




Comments