Azure Search Python SDK
Image from https://www.pexels.com/@gratisography/ |
I am trying to familiarize myself with Azure Search API. This is the beginning of this learning journey.
Install Azure Search Service
Create an Azure resource group and create an Azure Search Service.
Once it is installed, note the API endpoint and the access key as shown below
Python Implementation
Azure Search Python Library
pip install azure-search-documents
I used Faker to generate dummy data and load them into Azure Search
pip install faker
Source Code
import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
CorsOptions,
ComplexField,
SearchIndex,
ScoringProfile,
SearchFieldDataType,
SimpleField,
SearchableField
)
endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]
key = os.environ["AZURE_SEARCH_API_KEY"]
We have the endpoint and access key as environment variables.
# index name - for organizing the search fields, indices and the documents.
index_name = "profiles"
# Python client for managing index in Azure Search
search_index_client = SearchIndexClient(endpoint, AzureKeyCredential(key))
# Python client for performing search
search_client = SearchClient(endpoint, index_name, AzureKeyCredential(key))
# Search Fields
fields = [
SimpleField(name="username",
type=SearchFieldDataType.String, key=True),
SearchableField(name="name", type=SearchFieldDataType.String),
SearchableField(name="mail", type=SearchFieldDataType.String),
ComplexField(name="address", fields=[
SearchableField(name="address1", type=SearchFieldDataType.String),
SearchableField(name="address2", type=SearchFieldDataType.String),
])
]
# Search index object
index = SearchIndex(
name=index_name,
fields=fields,
scoring_profiles=[],
cors_options=CorsOptions(allowed_origins=["*"], max_age_in_seconds=60))
There are two clients for that we need to create
- SearchIndexClient which is responsible for creating the search index.
- SearchClient which is used for uploading the dummy search document and performing the search
For the fields, there are
SimpleField
which are not searchableSearchableField
is self-explanatory. I looked up the source code andSimpleField
is basicallySearchableField
with a propertysearchable = False
ComplexField
is a collection of fields
To create a search index, we just do search_index_client.create_index(index)
and to delete it, we do search_index_client.delete_index(index)
I have created a function to load dummy documents to the Azure Search
def add_documents(count: int) -> list[str]:
"""Add documents to the search index
Args:
count (int): number of document to add. Essentially, using Faker to
generate the number of profiles.
Returns:
list[str]: list of user names.
"""
import faker
faker_inst = faker.Faker()
def create_profile(profile):
del profile["sex"]
del profile["birthdate"]
addresses = profile["address"].split("\n")
profile["address"] = {
"address1": addresses[0],
"address2": addresses[1],
}
return profile
profiles = [create_profile(faker_inst.simple_profile())
for _ in range(count)]
search_client.upload_documents(documents=profiles)
Lastly, the search can be done with results = list(search_client.search("<some text>"))
The source code is a gist.
Here is a sample search result set when I search for Matthew Lee
[
{
"name": "Matthew Lee",
"username": "whitney26",
"mail": "rebeccaanderson@gmail.com",
"address": {
"address1": "441 Sean Stream Apt. 913",
"address2": "New Michael, VA 56105"
},
"@search.score": 3.215229,
"@search.highlights": null
},
{
"name": "Vanessa Lee",
"username": "lauren96",
"mail": "nathan72@hotmail.com",
"address": {
"address1": "3072 David Glens Apt. 688",
"address2": "West Patricia, HI 45538"
},
"@search.score": 2.0040388,
"@search.highlights": null
},
{
"name": "Lee Bass",
"username": "nataliemiller",
"mail": "richardjohnson@hotmail.com",
"address": {
"address1": "968 Leslie Flats Suite 292",
"address2": "New Joseville, AS 66785"
},
"@search.score": 1.9746565,
"@search.highlights": null
},
{
"name": "Mr. Daniel Lin",
"username": "charles08",
"mail": "shepherdjennifer@hotmail.com",
"address": {
"address1": "93232 Matthew Canyon",
"address2": "Sarahborough, TN 80405"
},
"@search.score": 1.8142501,
"@search.highlights": null
},
{
"name": "Carrie Allen",
"username": "raymond62",
"mail": "donald36@gmail.com",
"address": {
"address1": "28966 Matthew Park",
"address2": "Rothfurt, PA 69847"
},
"@search.score": 1.7441907,
"@search.highlights": null
}
]
Here is a search score associated with each entry. (see documentation)
Comments
Post a Comment