Image from https://www.pexels.com/@apasaric/ |
I was finding a Python library to do fuzzy search. Given a blob of text, we want to find a string of texts. And fuzzysearch Python library is perfect. Here are some code snippets on how to use the find_near_matches function.
Taking a famous typo from a book, Karen Harper's The Queen's Governess. wanton was misspelled as wonton (Chinese dumplings)
"I tugged on the gown and sleeves I’d discarded like a wonton last night to fall into John’s arms"
I searched for "a wanton last night" with an exact match string search, and I will not get any results. Below is an example of using fuzzy search.
from fuzzysearch import find_near_matches blob_txt = ("I tugged on the gown and sleeves I’d discarded like a " "wonton last night to fall into John’s arms") fuzzy_result = find_near_matches("a wanton last night", blob_txt, max_l_dist=3) print(fuzzy_result)
max_l_dist=3
tells the function to not return if the Levenshtein distance is greater than 3.
The result is [Match(start=52, end=71, dist=1, matched='a wonton last night')]
Let's change the blob_txt
a little (removing "last") so that we do not get any matches.
from fuzzysearch import find_near_matches blob_txt = ("I tugged on the gown and sleeves I’d discarded like a " "wonton night to fall into John’s arms") fuzzy_result = find_near_matches("a wanton last night", blob_txt, max_l_dist=3) print(fuzzy_result)
Now, the Levenshtein distance is greater than 5 (1 for "wonton" -> "wanton", and 4 for the missing "last"). Since we have max_l_dist=3
, We shall not get any results from this function.
It is so nice that this function is available and it is easy to use.
Comments
Post a Comment