Fuzzy Search made easy

Image from https://www.pexels.com/@apasaric/
Image from https://www.pexels.com/@apasaric/

I was finding a Python library to do fuzzy search. Given a blob of text, we want to find a string of texts. And fuzzysearch Python library is perfect. Here are some code snippets on how to use the find_near_matches function.

Taking a famous typo from a book, Karen Harper's The Queen's Governess. wanton was misspelled as wonton (Chinese dumplings)

"I tugged on the gown and sleeves I’d discarded like a wonton last night to fall into John’s arms"

I searched for "a wanton last night" with an exact match string search, and I will not get any results. Below is an example of using fuzzy search.


from fuzzysearch import find_near_matches

blob_txt = ("I tugged on the gown and sleeves I’d discarded like a "
        "wonton last night to fall into John’s arms")

fuzzy_result = find_near_matches("a wanton last night", blob_txt, max_l_dist=3)
print(fuzzy_result)

max_l_dist=3 tells the function to not return if the Levenshtein distance is greater than 3.

The result is [Match(start=52, end=71, dist=1, matched='a wonton last night')]

Let's change the blob_txt a little (removing "last") so that we do not get any matches.

from fuzzysearch import find_near_matches

blob_txt = ("I tugged on the gown and sleeves I’d discarded like a "
        "wonton night to fall into John’s arms")

fuzzy_result = find_near_matches("a wanton last night", blob_txt, max_l_dist=3)
print(fuzzy_result)

Now, the Levenshtein distance is greater than 5 (1 for "wonton" -> "wanton", and 4 for the missing "last"). Since we have max_l_dist=3, We shall not get any results from this function.

It is so nice that this function is available and it is easy to use.



Comments

Popular posts from this blog

OpenAI: Functions Feature in 2023-07-01-preview API version

Storing embedding in Azure Database for PostgreSQL

Happy New Year, 2024 from DALL-E