Azure Cognitive Search - Synonyms

Image from https://www.pexels.com/@agk42/
Image from https://www.pexels.com/@agk42/

I was reading about the Synonyms feature in Azure Cognitive Search and decided to test it with its Python API.

These are the dependencies.

python-dotenv==0.21.1

azure-identity==1.12.0
azure-search-documents==11.3.0
azure-search==1.0.0b2


We create the clients. An index client and a search client.

from dotenv import load_dotenv
import os

from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents import SearchClient

load_dotenv()
SERVICE_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
SERVICE_KEY = os.getenv("AZURE_SEARCH_API_KEY")
INDEX_NAME = "test-synonyms"

index_client = SearchIndexClient(SERVICE_ENDPOINT, AzureKeyCredential(SERVICE_KEY))
search_client = SearchClient(
    SERVICE_ENDPOINT, INDEX_NAME, AzureKeyCredential(SERVICE_KEY)
)


Next. we create the synonym mapping

SYNONYM_MAP_NAME = "test-syn-map"

from azure.search.documents.indexes.models import SynonymMap

synonyms = [
    "Jenna, Ortega, Wednesday Addams\n",
]
synonym_map = SynonymMap(name=SYNONYM_MAP_NAME, synonyms=synonyms)
index_client.create_synonym_map(synonym_map)

Here, we created 3 synonyms, "Jenna", "Ortega" and "Wednesday Addams". Note: there is a "\n" at the end.

Let's create some search documents to test these out.

from azure.search.documents.indexes.models import (
    CorsOptions,
    SearchFieldDataType,
    SimpleField,
    SearchableField,
    SearchIndex,
)

INDEX_NAME = "test-synonyms"

fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True),
    SearchableField(
        name="text",
        type=SearchFieldDataType.String,
        synonym_map_names=[SYNONYM_MAP_NAME],
    ),
]

index = SearchIndex(
    name=INDEX_NAME,
    fields=fields,
    cors_options=CorsOptions(allowed_origins=["*"], max_age_in_seconds=60),
)
index_client.create_index(index)

search_client.upload_documents(
    documents=[
        {"id": "1", "text": "Wednesday Addams in Netflix"},
        {"id": "2", "text": "Wednesday Addams' dance"},
        {"id": "3", "text": "Jenna Ortega's dance went wild in Tik Tok"},
    ]
)

We have created the search index and uploaded 3 documents.

Now, we can test searching.

import json

search_docs = search_client.search('Jenna', search_fields=["text"])
print(json.dumps(list(search_docs), indent=4))

Searching for Jenna, Ortega, and "Wednesday Addams" (note the double quotes) gives us 3 matching results.

[
    {
        "id": "1",
        "text": "Wednesday Addams in Netflix",
        "@search.score": 0.5753642,
        "@search.highlights": null
    },
    {
        "id": "3",
        "text": "Jenna Ortega's dance went wild in Tik Tok",
        "@search.score": 0.51623213,
        "@search.highlights": null
    },
    {
        "id": "2",
        "text": "Wednesday Addams' dance",
        "@search.score": 0.5063205,
        "@search.highlights": null
    }
]

Lastly, we try the query rewrite feature by changing the synonym mapping to

synonyms = [
    "Jenna, Ortega => Wednesday Addams\n",
]

Now, when I search for Jenna or Ortega, the search query will rewrite them to Wednesday Addams. That's

search_docs = search_client.search('Jenna', search_fields=["text"])
print(json.dumps(list(search_docs), indent=4))

Searching for Jenna here is altered to searching for Wednesday Addams. So the search results are

[
    {
        "text": "Wednesday Addams in Netflix",
        "id": "1",
        "@search.score": 0.5753642,
        "@search.highlights": null
    },
    {
        "text": "Wednesday Addams' dance",
        "id": "2",
        "@search.score": 0.5063205,
        "@search.highlights": null
    }
]

"Jenna Ortega's dance went wild in Tik Tok" is no longer in the search result.

 


Comments