Natural Language Processing Libraries (in Python)

Picture from Pixabay (https://www.pexels.com/@pixabay/)
Picture from Pixabay (https://www.pexels.com/@pixabay/)

I was trying out 3 Natural Language Processing (NLP) Python Libraries. Mainly to see if they can extract nouns from sentences; and their execution time.

nltk==3.7
spacy==3.4.1
textblob==0.17.1


import nltk

nltk.download("punkt")
nltk.download("averaged_perceptron_tagger")

import spacy

from spacy.cli.download import download

download(model="en_core_web_sm")
spacy_nlp = spacy.load("en_core_web_sm")

from textblob import TextBlob

import ssl
import time

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context


txt = """The Natural Language Processing group focuses on developing efficient
algorithms to process text and to make their information accessible to computer
applications."""


def time_taken(func):
    def wrapper():
        t = time.time()
        res = func()
        print("Function took " + str(time.time() - t) + " seconds to run")
        return res

    return wrapper


@time_taken
def run_nltk():
    print([word for (word, pos) in nltk.pos_tag(nltk.word_tokenize(txt)) if pos[0] == "N"])


@time_taken
def run_spacy():
    print([ent.text for ent in spacy_nlp(txt) if ent.pos_ == "NOUN"])


@time_taken
def run_textblob():
    print([word for (word, pos) in TextBlob(txt).pos_tags if pos[0] == "N"])


run_nltk()
run_spacy()
run_textblob()


and here is what I got (running on MacBook Pro (15-inch, 2018))


NLTK: Natural Language Toolkit

['Natural', 'Language', 'Processing', 'group', 'algorithms', 'text',
 'information', 'accessible', 'computer', 'applications']
Function took 0.14807605743408203 seconds to run


spaCy

['Processing', 'group', 'algorithms', 'text', 'information', 'computer', 'applications']
Function took 0.011222124099731445 seconds to run

note that it is faster and it is extracting lesser nouns.


TextBlob

['Natural', 'Language', 'Processing', 'group', 'algorithms', 'text',
'information', 'accessible', 'computer', 'applications']
Function took 0.0018911361694335938 seconds to run

TextBlob is also fast and it is extracting the same nouns as NLTK. It appears to be the winner.

Disclaimer: I did not do much benchmarking.


I also create a small web application for this where I enter some text and get TextBlob to extract the nouns and highlight the nouns.



 



Comments