![]() |
| Image from https://www.pexels.com/@skitterphoto/ |
Many times, we need to search for information in the web when we are working with OpenAI. One of the reasons is that the dataset for training for OpenAI is never up to date. For instance, the OpenAI GPT-4 Turbo has knowledge of events up to April 2023.
Here is the Python implementation with Pydantic model.
Dependencies
python = "^3.10" pydantic-settings = "^2.1.0" pydantic = "^2.5.2" azure-cognitiveservices-search-websearch = "^2.0.0"
Environment
These are the environment parameters needed. Have the following in a .env file.
azure_bing_search_endpoint="https://api.bing.microsoft.com" azure_bing_search_key="<key>"
We look under "Keys and Endpoint" for these values.
Pydantic Setting Model
from pydantic_settings import BaseSettings
class BingSearchSettings(BaseSettings):
azure_bing_search_endpoint: str
azure_bing_search_key: str
class Config:
env_file = ".env"
extra = "ignore"Pydantic Models
I use Pydantic most of the time, so I created a set of Pydantic models for Search results. This is optional because we only care about the snippets of search results.
from pydantic import BaseModel
class CognitiveSearchResponseQueryContext(BaseModel):
original_query: str
ask_user_for_location: bool | None = None
class CognitiveSearchWebImageObject(BaseModel):
thumbnail_url: str
width: int
height: int
class CognitiveSearchWebPage(BaseModel):
name: str
url: str
id: str | None = None
thumbnail_url: str | None = None
display_url: str | None = None
snippet: str | None = None
date_last_crawled: str | None = None
deep_links: list["CognitiveSearchWebPage"] = []
primary_image_of_page: CognitiveSearchWebImageObject | None = None
class CognitiveSearchImageThumbnail(BaseModel):
width: int
height: int
class CognitiveSearchImageObject(BaseModel):
web_search_url: str
name: str
thumbnail_url: str
content_url: str
host_page_url: str
width: int
height: int
thumbnail: CognitiveSearchImageThumbnail
class CognitiveSearchImages(BaseModel):
id: str
web_search_url: str
is_family_friendly: bool
value: list[CognitiveSearchImageObject]
class CognitiveSearchWebAnswer(BaseModel):
web_search_url: str
total_estimated_matches: int
value: list[CognitiveSearchWebPage]
class CognitiveSearchRelatedSearchAnswer(BaseModel):
text: str
display_text: str
web_search_url: str
class CognitiveSearchRelatedSearchAnswers(BaseModel):
id: str
value: list[CognitiveSearchRelatedSearchAnswer]
class CognitiveSearchVideo(BaseModel):
web_search_url: str
name: str
description: str
thumbnail_url: str
content_url: str
host_page_url: str
width: int
height: int
motion_thumbnail_url: str
embed_html: str
allow_https_embed: bool
view_count: int
thumbnail: CognitiveSearchImageThumbnail
allow_mobile_embed: bool
is_superfresh: bool
class CognitiveSearchVideos(BaseModel):
id: str
web_search_url: str
is_family_friendly: bool
value: list[CognitiveSearchVideo]
class CognitiveSearchRankingResponseMainLineItemValue(BaseModel):
id: str
class CognitiveSearchRankingResponseMainLineItem(BaseModel):
answer_type: str
result_index: int | None = None
value: CognitiveSearchRankingResponseMainLineItemValue
class CognitiveSearchRankingResponseMainLine(BaseModel):
items: list[CognitiveSearchRankingResponseMainLineItem]
class CognitiveSearchRankingResponseSidebarItemValue(BaseModel):
id: str
class CognitiveSearchRankingResponseSidebarItem(BaseModel):
answer_type: str
result_index: int | None = None
value: CognitiveSearchRankingResponseSidebarItemValue | None = None
class CognitiveSearchRankingResponseSidebar(BaseModel):
items: list[CognitiveSearchRankingResponseSidebarItem]
class CognitiveSearchRankingResponse(BaseModel):
mainline: CognitiveSearchRankingResponseMainLine
sidebar: CognitiveSearchRankingResponseSidebar
class CognitiveSearchResponse(BaseModel):
query_context: CognitiveSearchResponseQueryContext
web_pages: CognitiveSearchWebAnswer
related_searches: CognitiveSearchRelatedSearchAnswers
images: CognitiveSearchImages | None = None
videos: CognitiveSearchVideos | None = None
ranking_response: CognitiveSearchRankingResponse
Search Tool
Next, we created a search tool.
from azure.cognitiveservices.search.websearch import WebSearchClient
from msrest.authentication import CognitiveServicesCredentials
from common.settings import BingSearchSettings
from models.cognitive_search import BingSearchResponse
def get_client(settings: BingSearchSettings) -> WebSearchClient:
"""Get Azure Cognitive Search client.
:param settings: Azure Cognitive Search settings.
"""
client = WebSearchClient(
endpoint=settings.azure_bing_search_endpoint,
credentials=CognitiveServicesCredentials(settings.azure_bing_search_key),
)
client.config.base_url = "{Endpoint}/v7.0" # workaround for a bug
return client
def search(settings: BingSearchSettings, query: str) -> str:
"""Search Azure Cognitive Search.
:param settings: Azure Cognitive Search settings.
:param query: Query string.
:return: Search results.
"""
web_data = get_client(settings=settings).web.search(
query=query, text_decorations=True, text_format="HTML"
)
results = BingSearchResponse(**web_data.as_dict()) # type: ignore
if not results.web_pages or not results.web_pages.value:
return ""
# concatenate snippets into a single string
return " ".join([v.snippet for v in results.web_pages.value if v.snippet])
Test
To test it out, we can do
settings = BingSearchSettings.model_validate({})
results = search(settings=settings, query="Python")
print(results)and we got
<b>Python</b> is a versatile and powerful language that lets you work quickly and integrate systems more effectively. Learn how to get started, download the latest version, access documentation, find jobs, and discover success stories and events related to<b> Python.</b> This tutorial introduces the reader informally to the basic concepts and features of the <b>Python</b> language and system. It helps to have a <b>Python</b> interpreter handy for hands-on experience, but all examples are self-contained, so the tutorial can be read off-line as well. For a description of standard objects and modules, see The <b>Python</b> Standard ... W3Schools offers free online tutorials, references and exercises in<b> Python,</b> a popular programming language that can be used for web applications, data analysis, automation and more. Learn<b> Python</b> by examples, try it yourself, test your skills with quizzes and exercises, and get certified by W3Schools. Learn the basics of<b> Python,</b> a popular and easy-to-use programming language, from installing it to using it for various purposes. Find out how to access online documentation, tutorials, books, code samples, and more resources to help you get started with<b> Python.</b> <b>Python</b> is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. <b>Python</b> is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. Learn the basics of <b> Python,</b> a popular programming language that can be used as a calculator, a text editor, and a calculator. This tutorial covers numbers, text, expressions, variables, and more with examples and comments. A tutorial for beginners to learn the basics of<b> Python,</b> a high-level, interpreted, interactive, and object-oriented programming language. Learn what<b> Python</b> is, why you should use it, how to download and install it, how to run your interpreter, and how to handle errors, get help, and code style.
Note that we have
text_decoration on in the function call to Bing Search and hence we got Python in the result. web_data = get_client(settings=settings).web.search(
query=query, text_decorations=True, text_format="HTML"
)

Comments
Post a Comment