Roadmap of NLP for Machine Learning

Hrisav Bhowmick
Analytics Vidhya
Published in
2 min readDec 17, 2021

Natural Language Processing (NLP) is the AI-based solution that helps computers understand, interpret and manipulate human language. NLP has several practical use cases like Machine Translation, Conversational AI bots, Resume evaluation, Fraud detection, etc. NLP leverage the concepts of Tokenization, Entity Recognition, Word Embeddings, Topic Modeling, Transfer Learning to build AI-based systems.

Following is the roadmap that I followed during my post-grad Data Science course and it has benefitted me immensely to prepare for the ML interviews. It is also helping me at the workplace, where my work focuses mainly on NLP and Deep Learning.

Pre-processing

  • Sentence cleaning
  • Stop Words
  • Regular Expression
  • Tokenization
  • N-grams (Unigram, Bigram, Trigram)
  • Text Normalization
  • Stemming
  • Lemmatization

Linguistics

  • Part-of-Speech Tags
  • Constituency Parsing
  • Dependency Parsing
  • Syntactic Parsing
  • Semantic Analysis
  • Lexical Semantics
  • Coreference Resolution
  • Chunking
  • Entity Extraction / Named Entity Recognition (NER)
  • Named Entity Disambiguation / Entity Linking
  • Knowledge Graphs

Word Embeddings

1. Frequency-based Word Embedding

  • One Hot Encoding
  • Bag of Words or CountVectorizer()
  • TFIDF or TfidfVectorizer()
  • Co-occurrence Matrix, Co-occurrence Vector
  • HashingVectorizer

2. Pretrained Word Embedding

  • Word2Vec (by Google) : (2 types) CBOW, Skip-Gram
  • GloVe (by Stanford)
  • fastText (by Facebook)

Topic Modeling

  • Latent Semantic Analysis (LSA)
  • Probabilistic Latent Semantic Analysis (pLSA)
  • Latent Dirichlet Allocation (LDA)
  • lda2Vec
  • Non-Negative Matrix Factorization (NMF)

NLP with Deep Learning

  • Machine Learning (Logistic Regression, SVM, Naïve Bayes)
  • Embedding Layer
  • Artificial Neural Network
  • Deep Neural Network
  • Convolution Neural Network
  • RNN/LSTM/GRU
  • Bi-RNN/Bi-LSTM/Bi-GRU
  • Pretrained Language Models: ELMo, ULMFiT
  • Sequence-to-Sequence/Encoder-Decoder
  • Transformers (attention mechanism)
  • Encoder-only Transformers: BERT
  • Decoder-only Transformers: GPT
  • Transfer Learning

Example Use cases

  • Sentiment Analysis
  • Question Answering
  • Language Translation
  • Text/Intent Classification
  • Text Summarization
  • Text Similarity
  • Text Clustering
  • Text Generation
  • Chatbots (DialogFlow, RASA, Self-made Bots)

Libraries

  • NLTK
  • Spacy
  • Gensim (mainly for topic modeling)

Free YouTube resources:

Credits to Standford University, NPTEL, Sentdex, Krish Naik.

Check out these Blogs

Thanks for reading the article! If you like my article do 👏. Have I missed any vital topic? Let me know in the comments. I’ll update!

If you are interested to check out the Mathematics roadmap for Machine Learning, click here.

Connect with me on Linked-in for more updates or any help related to how to move forward with the above topics.

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Hrisav Bhowmick
Hrisav Bhowmick

Written by Hrisav Bhowmick

A machine learning enthusiast, eager to solve real world problems.

Responses (3)