Roadmap of NLP for Machine Learning

Hrisav Bhowmick
2 min readDec 17, 2021

--

Natural Language Processing (NLP) is the AI-based solution that helps computers understand, interpret and manipulate human language. NLP has several practical use cases like Machine Translation, Conversational AI bots, Resume evaluation, Fraud detection, etc. NLP leverage the concepts of Tokenization, Entity Recognition, Word Embeddings, Topic Modeling, Transfer Learning to build AI-based systems.

Following is the roadmap that I followed during my post-grad Data Science course and it has benefitted me immensely to prepare for the ML interviews. It is also helping me at the workplace, where my work focuses mainly on NLP and Deep Learning.

Pre-processing

  • Sentence cleaning
  • Stop Words
  • Regular Expression
  • Tokenization
  • N-grams (Unigram, Bigram, Trigram)
  • Text Normalization
  • Stemming
  • Lemmatization

Linguistics

  • Part-of-Speech Tags
  • Constituency Parsing
  • Dependency Parsing
  • Syntactic Parsing
  • Semantic Analysis
  • Lexical Semantics
  • Coreference Resolution
  • Chunking
  • Entity Extraction / Named Entity Recognition (NER)
  • Named Entity Disambiguation / Entity Linking
  • Knowledge Graphs

Word Embeddings

1. Frequency-based Word Embedding

  • One Hot Encoding
  • Bag of Words or CountVectorizer()
  • TFIDF or TfidfVectorizer()
  • Co-occurrence Matrix, Co-occurrence Vector
  • HashingVectorizer

2. Pretrained Word Embedding

  • Word2Vec (by Google) : (2 types) CBOW, Skip-Gram
  • GloVe (by Stanford)
  • fastText (by Facebook)

Topic Modeling

  • Latent Semantic Analysis (LSA)
  • Probabilistic Latent Semantic Analysis (pLSA)
  • Latent Dirichlet Allocation (LDA)
  • lda2Vec
  • Non-Negative Matrix Factorization (NMF)

NLP with Deep Learning

  • Machine Learning (Logistic Regression, SVM, Naïve Bayes)
  • Embedding Layer
  • Artificial Neural Network
  • Deep Neural Network
  • Convolution Neural Network
  • RNN/LSTM/GRU
  • Bi-RNN/Bi-LSTM/Bi-GRU
  • Pretrained Language Models: ELMo, ULMFiT
  • Sequence-to-Sequence/Encoder-Decoder
  • Transformers (attention mechanism)
  • Encoder-only Transformers: BERT
  • Decoder-only Transformers: GPT
  • Transfer Learning

Example Use cases

  • Sentiment Analysis
  • Question Answering
  • Language Translation
  • Text/Intent Classification
  • Text Summarization
  • Text Similarity
  • Text Clustering
  • Text Generation
  • Chatbots (DialogFlow, RASA, Self-made Bots)

Libraries

  • NLTK
  • Spacy
  • Gensim (mainly for topic modeling)

Free YouTube resources:

Credits to Standford University, NPTEL, Sentdex, Krish Naik.

Check out these Blogs

Thanks for reading the article! If you like my article do 👏. Have I missed any vital topic? Let me know in the comments. I’ll update!

If you are interested to check out the Mathematics roadmap for Machine Learning, click here.

Connect with me on Linked-in for more updates or any help related to how to move forward with the above topics.

--

--

Hrisav Bhowmick

A machine learning enthusiast, eager to solve real world problems.