Picture from Pixabay (https://www.pexels.com/@pixabay/) |
I was trying out 3 Natural Language Processing (NLP) Python Libraries. Mainly to see if they can extract nouns from sentences; and their execution time.
nltk==3.7 spacy==3.4.1 textblob==0.17.1
import nltk nltk.download("punkt") nltk.download("averaged_perceptron_tagger") import spacy from spacy.cli.download import download download(model="en_core_web_sm") spacy_nlp = spacy.load("en_core_web_sm") from textblob import TextBlob import ssl import time try: _create_unverified_https_context = ssl._create_unverified_context except AttributeError: pass else: ssl._create_default_https_context = _create_unverified_https_context txt = """The Natural Language Processing group focuses on developing efficient algorithms to process text and to make their information accessible to computer applications.""" def time_taken(func): def wrapper(): t = time.time() res = func() print("Function took " + str(time.time() - t) + " seconds to run") return res return wrapper @time_taken def run_nltk(): print([word for (word, pos) in nltk.pos_tag(nltk.word_tokenize(txt)) if pos[0] == "N"]) @time_taken def run_spacy(): print([ent.text for ent in spacy_nlp(txt) if ent.pos_ == "NOUN"]) @time_taken def run_textblob(): print([word for (word, pos) in TextBlob(txt).pos_tags if pos[0] == "N"]) run_nltk() run_spacy() run_textblob()
and here is what I got (running on MacBook Pro (15-inch, 2018))
NLTK: Natural Language Toolkit
['Natural', 'Language', 'Processing', 'group', 'algorithms', 'text', 'information', 'accessible', 'computer', 'applications'] Function took 0.14807605743408203 seconds to run
spaCy
['Processing', 'group', 'algorithms', 'text', 'information', 'computer', 'applications'] Function took 0.011222124099731445 seconds to run
note that it is faster and it is extracting lesser nouns.
TextBlob
['Natural', 'Language', 'Processing', 'group', 'algorithms', 'text', 'information', 'accessible', 'computer', 'applications'] Function took 0.0018911361694335938 seconds to run
TextBlob is also fast and it is extracting the same nouns as NLTK. It appears to be the winner.
Disclaimer: I did not do much benchmarking.
I also create a small web application for this where I enter some text and get TextBlob to extract the nouns and highlight the nouns.
Comments
Post a Comment