Skip to content

Practice NLP and relevant libraries, this time in Python. Regular Expressions, Tokenization, Topic Identification, NER, Classifiers

Notifications You must be signed in to change notification settings

BabakBar/NLP-Base

Repository files navigation

NLP-Base

Practice NLP and relevant libraries, this time in Python.

  • Regular expressions & word tokenization: basic NLP concepts, such as word tokenization and regular expressions to help parse text. Also how to handle non-English text and more difficult tokenization we might find.
  • Topic identification: Identify topics from texts based on term frequencies. We do experiment and compare two simple methods: bag-of-words and Tf-idf using NLTK, and a new library Gensim.
  • Named-entity recognition: Identify the who, what, and where of our texts using pre-trained models on English and non-English text. Also how to use polyglot and spaCy, to add to NLP toolbox.
  • Fake News Classifier: With basics along with supervised ML we build a "fake news" detector.

About

Practice NLP and relevant libraries, this time in Python. Regular Expressions, Tokenization, Topic Identification, NER, Classifiers

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published