Fuzzy Search made easy

Image from https://www.pexels.com/@apasaric/
Image from https://www.pexels.com/@apasaric/

I was finding a Python library to do fuzzy search. Given a blob of text, we want to find a string of texts. And fuzzysearch Python library is perfect. Here are some code snippets on how to use the find_near_matches function.

Taking a famous typo from a book, Karen Harper's The Queen's Governess. wanton was misspelled as wonton (Chinese dumplings)

"I tugged on the gown and sleeves I’d discarded like a wonton last night to fall into John’s arms"

I searched for "a wanton last night" with an exact match string search, and I will not get any results. Below is an example of using fuzzy search.


from fuzzysearch import find_near_matches

blob_txt = ("I tugged on the gown and sleeves I’d discarded like a "
        "wonton last night to fall into John’s arms")

fuzzy_result = find_near_matches("a wanton last night", blob_txt, max_l_dist=3)
print(fuzzy_result)

max_l_dist=3 tells the function to not return if the Levenshtein distance is greater than 3.

The result is [Match(start=52, end=71, dist=1, matched='a wonton last night')]

Let's change the blob_txt a little (removing "last") so that we do not get any matches.

from fuzzysearch import find_near_matches

blob_txt = ("I tugged on the gown and sleeves I’d discarded like a "
        "wonton night to fall into John’s arms")

fuzzy_result = find_near_matches("a wanton last night", blob_txt, max_l_dist=3)
print(fuzzy_result)

Now, the Levenshtein distance is greater than 5 (1 for "wonton" -> "wanton", and 4 for the missing "last"). Since we have max_l_dist=3, We shall not get any results from this function.

It is so nice that this function is available and it is easy to use.



Comments

Popular posts from this blog

Happy New Year, 2024 from DALL-E

Bing Search as tool for OpenAI

OpenAI: Functions Feature in 2023-07-01-preview API version