Azure Search Python SDK

Image from https://www.pexels.com/@gratisography/
Image from https://www.pexels.com/@gratisography/

I am trying to familiarize myself with Azure Search API. This is the beginning of this learning journey.

Install Azure Search Service

Create an Azure resource group and create an Azure Search Service.


Once it is installed, note the API endpoint and the access key as shown below



Python Implementation

Azure Search Python Library

pip install azure-search-documents

I used Faker to generate dummy data and load them into Azure Search

pip install faker


Source Code


import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    CorsOptions,
    ComplexField,
    SearchIndex,
    ScoringProfile,
    SearchFieldDataType,
    SimpleField,
    SearchableField
)


endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]
key = os.environ["AZURE_SEARCH_API_KEY"]

We have the endpoint and access key as environment variables.

# index name - for organizing the search fields, indices and the documents.
index_name = "profiles"

# Python client for managing index in Azure Search
search_index_client = SearchIndexClient(endpoint, AzureKeyCredential(key))

# Python client for performing search
search_client = SearchClient(endpoint, index_name, AzureKeyCredential(key))

# Search Fields
fields = [
    SimpleField(name="username",
                type=SearchFieldDataType.String, key=True),
    SearchableField(name="name", type=SearchFieldDataType.String),
    SearchableField(name="mail", type=SearchFieldDataType.String),
    ComplexField(name="address", fields=[
        SearchableField(name="address1", type=SearchFieldDataType.String),
        SearchableField(name="address2", type=SearchFieldDataType.String),
    ])
]

# Search index object
index = SearchIndex(
    name=index_name,
    fields=fields,
    scoring_profiles=[],
    cors_options=CorsOptions(allowed_origins=["*"], max_age_in_seconds=60))

There are two clients for that we need to create

  1. SearchIndexClient which is responsible for creating the search index.
  2. SearchClient which is used for uploading the dummy search document and performing the search

For the fields, there are

  1. SimpleField which are not searchable
  2. SearchableField is self-explanatory. I looked up the source code and SimpleField is basically SearchableField with a property searchable = False
  3. ComplexField is a collection of fields

To create a search index, we just do search_index_client.create_index(index)

and to delete it, we do search_index_client.delete_index(index)

I have created a function to load dummy documents to the Azure Search

def add_documents(count: int) -> list[str]:
    """Add documents to the search index

    Args:
        count (int): number of document to add. Essentially, using Faker to
        generate the number of profiles.

    Returns:
        list[str]: list of user names.
    """
    import faker
    faker_inst = faker.Faker()

    def create_profile(profile):
        del profile["sex"]
        del profile["birthdate"]
        addresses = profile["address"].split("\n")
        profile["address"] = {
            "address1": addresses[0],
            "address2": addresses[1],
        }
        return profile

    profiles = [create_profile(faker_inst.simple_profile())
                for _ in range(count)]

    search_client.upload_documents(documents=profiles)

Lastly, the search can be done with results = list(search_client.search("<some text>"))

The source code is a gist.

Here is a sample search result set when I search for Matthew Lee

[
    {
        "name": "Matthew Lee",
        "username": "whitney26",
        "mail": "rebeccaanderson@gmail.com",
        "address": {
            "address1": "441 Sean Stream Apt. 913",
            "address2": "New Michael, VA 56105"
        },
        "@search.score": 3.215229,
        "@search.highlights": null
    },
    {
        "name": "Vanessa Lee",
        "username": "lauren96",
        "mail": "nathan72@hotmail.com",
        "address": {
            "address1": "3072 David Glens Apt. 688",
            "address2": "West Patricia, HI 45538"
        },
        "@search.score": 2.0040388,
        "@search.highlights": null
    },
    {
        "name": "Lee Bass",
        "username": "nataliemiller",
        "mail": "richardjohnson@hotmail.com",
        "address": {
            "address1": "968 Leslie Flats Suite 292",
            "address2": "New Joseville, AS 66785"
        },
        "@search.score": 1.9746565,
        "@search.highlights": null
    },
    {
        "name": "Mr. Daniel Lin",
        "username": "charles08",
        "mail": "shepherdjennifer@hotmail.com",
        "address": {
            "address1": "93232 Matthew Canyon",
            "address2": "Sarahborough, TN 80405"
        },
        "@search.score": 1.8142501,
        "@search.highlights": null
    },
    {
        "name": "Carrie Allen",
        "username": "raymond62",
        "mail": "donald36@gmail.com",
        "address": {
            "address1": "28966 Matthew Park",
            "address2": "Rothfurt, PA 69847"
        },
        "@search.score": 1.7441907,
        "@search.highlights": null
    }
]

Here is a search score associated with each entry. (see documentation)


Comments

Popular posts from this blog

OpenAI: Functions Feature in 2023-07-01-preview API version

Storing embedding in Azure Database for PostgreSQL

Happy New Year, 2024 from DALL-E