Image from https://www.pexels.com/@skitterphoto/ |
Many times, we need to search for information in the web when we are working with OpenAI. One of the reasons is that the dataset for training for OpenAI is never up to date. For instance, the OpenAI GPT-4 Turbo has knowledge of events up to April 2023.
Here is the Python implementation with Pydantic model.
Dependencies
python = "^3.10" pydantic-settings = "^2.1.0" pydantic = "^2.5.2" azure-cognitiveservices-search-websearch = "^2.0.0"
Environment
These are the environment parameters needed. Have the following in a .env file.
azure_bing_search_endpoint="https://api.bing.microsoft.com" azure_bing_search_key="<key>"
We look under "Keys and Endpoint" for these values.
Pydantic Setting Model
from pydantic_settings import BaseSettings class BingSearchSettings(BaseSettings): azure_bing_search_endpoint: str azure_bing_search_key: str class Config: env_file = ".env" extra = "ignore"
Pydantic Models
I use Pydantic most of the time, so I created a set of Pydantic models for Search results. This is optional because we only care about the snippets of search results.
from pydantic import BaseModel class CognitiveSearchResponseQueryContext(BaseModel): original_query: str ask_user_for_location: bool | None = None class CognitiveSearchWebImageObject(BaseModel): thumbnail_url: str width: int height: int class CognitiveSearchWebPage(BaseModel): name: str url: str id: str | None = None thumbnail_url: str | None = None display_url: str | None = None snippet: str | None = None date_last_crawled: str | None = None deep_links: list["CognitiveSearchWebPage"] = [] primary_image_of_page: CognitiveSearchWebImageObject | None = None class CognitiveSearchImageThumbnail(BaseModel): width: int height: int class CognitiveSearchImageObject(BaseModel): web_search_url: str name: str thumbnail_url: str content_url: str host_page_url: str width: int height: int thumbnail: CognitiveSearchImageThumbnail class CognitiveSearchImages(BaseModel): id: str web_search_url: str is_family_friendly: bool value: list[CognitiveSearchImageObject] class CognitiveSearchWebAnswer(BaseModel): web_search_url: str total_estimated_matches: int value: list[CognitiveSearchWebPage] class CognitiveSearchRelatedSearchAnswer(BaseModel): text: str display_text: str web_search_url: str class CognitiveSearchRelatedSearchAnswers(BaseModel): id: str value: list[CognitiveSearchRelatedSearchAnswer] class CognitiveSearchVideo(BaseModel): web_search_url: str name: str description: str thumbnail_url: str content_url: str host_page_url: str width: int height: int motion_thumbnail_url: str embed_html: str allow_https_embed: bool view_count: int thumbnail: CognitiveSearchImageThumbnail allow_mobile_embed: bool is_superfresh: bool class CognitiveSearchVideos(BaseModel): id: str web_search_url: str is_family_friendly: bool value: list[CognitiveSearchVideo] class CognitiveSearchRankingResponseMainLineItemValue(BaseModel): id: str class CognitiveSearchRankingResponseMainLineItem(BaseModel): answer_type: str result_index: int | None = None value: CognitiveSearchRankingResponseMainLineItemValue class CognitiveSearchRankingResponseMainLine(BaseModel): items: list[CognitiveSearchRankingResponseMainLineItem] class CognitiveSearchRankingResponseSidebarItemValue(BaseModel): id: str class CognitiveSearchRankingResponseSidebarItem(BaseModel): answer_type: str result_index: int | None = None value: CognitiveSearchRankingResponseSidebarItemValue | None = None class CognitiveSearchRankingResponseSidebar(BaseModel): items: list[CognitiveSearchRankingResponseSidebarItem] class CognitiveSearchRankingResponse(BaseModel): mainline: CognitiveSearchRankingResponseMainLine sidebar: CognitiveSearchRankingResponseSidebar class CognitiveSearchResponse(BaseModel): query_context: CognitiveSearchResponseQueryContext web_pages: CognitiveSearchWebAnswer related_searches: CognitiveSearchRelatedSearchAnswers images: CognitiveSearchImages | None = None videos: CognitiveSearchVideos | None = None ranking_response: CognitiveSearchRankingResponse
Search Tool
Next, we created a search tool.
from azure.cognitiveservices.search.websearch import WebSearchClient from msrest.authentication import CognitiveServicesCredentials from common.settings import BingSearchSettings from models.cognitive_search import BingSearchResponse def get_client(settings: BingSearchSettings) -> WebSearchClient: """Get Azure Cognitive Search client. :param settings: Azure Cognitive Search settings. """ client = WebSearchClient( endpoint=settings.azure_bing_search_endpoint, credentials=CognitiveServicesCredentials(settings.azure_bing_search_key), ) client.config.base_url = "{Endpoint}/v7.0" # workaround for a bug return client def search(settings: BingSearchSettings, query: str) -> str: """Search Azure Cognitive Search. :param settings: Azure Cognitive Search settings. :param query: Query string. :return: Search results. """ web_data = get_client(settings=settings).web.search( query=query, text_decorations=True, text_format="HTML" ) results = BingSearchResponse(**web_data.as_dict()) # type: ignore if not results.web_pages or not results.web_pages.value: return "" # concatenate snippets into a single string return " ".join([v.snippet for v in results.web_pages.value if v.snippet])
Test
To test it out, we can do
settings = BingSearchSettings.model_validate({}) results = search(settings=settings, query="Python") print(results)
and we got
<b>Python</b> is a versatile and powerful language that lets you work quickly and integrate systems more effectively. Learn how to get started, download the latest version, access documentation, find jobs, and discover success stories and events related to<b> Python.</b> This tutorial introduces the reader informally to the basic concepts and features of the <b>Python</b> language and system. It helps to have a <b>Python</b> interpreter handy for hands-on experience, but all examples are self-contained, so the tutorial can be read off-line as well. For a description of standard objects and modules, see The <b>Python</b> Standard ... W3Schools offers free online tutorials, references and exercises in<b> Python,</b> a popular programming language that can be used for web applications, data analysis, automation and more. Learn<b> Python</b> by examples, try it yourself, test your skills with quizzes and exercises, and get certified by W3Schools. Learn the basics of<b> Python,</b> a popular and easy-to-use programming language, from installing it to using it for various purposes. Find out how to access online documentation, tutorials, books, code samples, and more resources to help you get started with<b> Python.</b> <b>Python</b> is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. <b>Python</b> is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. Learn the basics of <b> Python,</b> a popular programming language that can be used as a calculator, a text editor, and a calculator. This tutorial covers numbers, text, expressions, variables, and more with examples and comments. A tutorial for beginners to learn the basics of<b> Python,</b> a high-level, interpreted, interactive, and object-oriented programming language. Learn what<b> Python</b> is, why you should use it, how to download and install it, how to run your interpreter, and how to handle errors, get help, and code style.
Note that we have
text_decoration
on in the function call to Bing Search and hence we got Python
in the result.web_data = get_client(settings=settings).web.search( query=query, text_decorations=True, text_format="HTML" )
Comments
Post a Comment