Azure Cognitive Search - Synonyms
Image from https://www.pexels.com/@agk42/ |
I was reading about the Synonyms feature in Azure Cognitive Search and decided to test it with its Python API.
These are the dependencies.
python-dotenv==0.21.1 azure-identity==1.12.0 azure-search-documents==11.3.0 azure-search==1.0.0b2
We create the clients. An index client and a search client.
from dotenv import load_dotenv import os from azure.core.credentials import AzureKeyCredential from azure.search.documents.indexes import SearchIndexClient from azure.search.documents import SearchClient load_dotenv() SERVICE_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT") SERVICE_KEY = os.getenv("AZURE_SEARCH_API_KEY") INDEX_NAME = "test-synonyms" index_client = SearchIndexClient(SERVICE_ENDPOINT, AzureKeyCredential(SERVICE_KEY)) search_client = SearchClient( SERVICE_ENDPOINT, INDEX_NAME, AzureKeyCredential(SERVICE_KEY) )
Next. we create the synonym mapping
SYNONYM_MAP_NAME = "test-syn-map" from azure.search.documents.indexes.models import SynonymMap synonyms = [ "Jenna, Ortega, Wednesday Addams\n", ] synonym_map = SynonymMap(name=SYNONYM_MAP_NAME, synonyms=synonyms)
index_client.create_synonym_map(synonym_map)
Here, we created 3 synonyms, "Jenna", "Ortega" and "Wednesday Addams". Note: there is a "\n" at the end.
Let's create some search documents to test these out.
from azure.search.documents.indexes.models import ( CorsOptions, SearchFieldDataType, SimpleField, SearchableField, SearchIndex, ) INDEX_NAME = "test-synonyms" fields = [ SimpleField(name="id", type=SearchFieldDataType.String, key=True), SearchableField( name="text", type=SearchFieldDataType.String, synonym_map_names=[SYNONYM_MAP_NAME], ), ] index = SearchIndex( name=INDEX_NAME, fields=fields, cors_options=CorsOptions(allowed_origins=["*"], max_age_in_seconds=60), ) index_client.create_index(index) search_client.upload_documents( documents=[ {"id": "1", "text": "Wednesday Addams in Netflix"}, {"id": "2", "text": "Wednesday Addams' dance"}, {"id": "3", "text": "Jenna Ortega's dance went wild in Tik Tok"}, ] )
We have created the search index and uploaded 3 documents.
Now, we can test searching.
import json search_docs = search_client.search('Jenna', search_fields=["text"]) print(json.dumps(list(search_docs), indent=4))
Searching for Jenna, Ortega, and "Wednesday Addams" (note the double quotes) gives us 3 matching results.
[ { "id": "1", "text": "Wednesday Addams in Netflix", "@search.score": 0.5753642, "@search.highlights": null }, { "id": "3", "text": "Jenna Ortega's dance went wild in Tik Tok", "@search.score": 0.51623213, "@search.highlights": null }, { "id": "2", "text": "Wednesday Addams' dance", "@search.score": 0.5063205, "@search.highlights": null } ]
Lastly, we try the query rewrite feature by changing the synonym mapping to
synonyms = [ "Jenna, Ortega => Wednesday Addams\n", ]
Now, when I search for Jenna or Ortega, the search query will rewrite them to Wednesday Addams. That's
search_docs = search_client.search('Jenna', search_fields=["text"]) print(json.dumps(list(search_docs), indent=4))
Searching for Jenna here is altered to searching for Wednesday Addams. So the search results are
[ { "text": "Wednesday Addams in Netflix", "id": "1", "@search.score": 0.5753642, "@search.highlights": null }, { "text": "Wednesday Addams' dance", "id": "2", "@search.score": 0.5063205, "@search.highlights": null } ]
"Jenna Ortega's dance went wild in Tik Tok" is no longer in the search result.
Comments
Post a Comment