![]() |
| Picture from Pixabay (https://www.pexels.com/@pixabay/) |
I was trying out 3 Natural Language Processing (NLP) Python Libraries. Mainly to see if they can extract nouns from sentences; and their execution time.
nltk==3.7 spacy==3.4.1 textblob==0.17.1
import nltk
nltk.download("punkt")
nltk.download("averaged_perceptron_tagger")
import spacy
from spacy.cli.download import download
download(model="en_core_web_sm")
spacy_nlp = spacy.load("en_core_web_sm")
from textblob import TextBlob
import ssl
import time
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
txt = """The Natural Language Processing group focuses on developing efficient
algorithms to process text and to make their information accessible to computer
applications."""
def time_taken(func):
def wrapper():
t = time.time()
res = func()
print("Function took " + str(time.time() - t) + " seconds to run")
return res
return wrapper
@time_taken
def run_nltk():
print([word for (word, pos) in nltk.pos_tag(nltk.word_tokenize(txt)) if pos[0] == "N"])
@time_taken
def run_spacy():
print([ent.text for ent in spacy_nlp(txt) if ent.pos_ == "NOUN"])
@time_taken
def run_textblob():
print([word for (word, pos) in TextBlob(txt).pos_tags if pos[0] == "N"])
run_nltk()
run_spacy()
run_textblob()
and here is what I got (running on MacBook Pro (15-inch, 2018))
NLTK: Natural Language Toolkit
['Natural', 'Language', 'Processing', 'group', 'algorithms', 'text', 'information', 'accessible', 'computer', 'applications'] Function took 0.14807605743408203 seconds to run
spaCy
['Processing', 'group', 'algorithms', 'text', 'information', 'computer', 'applications'] Function took 0.011222124099731445 seconds to run
note that it is faster and it is extracting lesser nouns.
TextBlob
['Natural', 'Language', 'Processing', 'group', 'algorithms', 'text', 'information', 'accessible', 'computer', 'applications'] Function took 0.0018911361694335938 seconds to run
TextBlob is also fast and it is extracting the same nouns as NLTK. It appears to be the winner.
Disclaimer: I did not do much benchmarking.
I also create a small web application for this where I enter some text and get TextBlob to extract the nouns and highlight the nouns.

Comments
Post a Comment