ESE begin 27 April 2026. View Timetable
Logo
CoreCuratedNLPModule 2

N-gram Language Models

Language modeling with N-grams

LECTURE 12: LANGUAGE MODEL

Q16. Explain Language Models and N-Gram Language Models.

A language model is a statistical model that assigns probabilities to sequences of words. It estimates how likely a sentence is and can also predict the next word in a sequence.

Formally, for a sentence W=(w1,w2,,wn)W = (w_1, w_2, \dots, w_n):

P(W)=P(w1,w2,,wn)P(W) = P(w_1, w_2, \dots, w_n)

Language models are widely used in applications such as autocomplete, machine translation, speech recognition, handwriting recognition, spelling correction, and OCR.

An N-gram language model approximates these probabilities by considering only a limited context of the previous n1n-1 words (Markov assumption), making computation feasible for large datasets.

Types of N-grams:

  • Unigram (n=1n = 1): no context
  • Bigram (n=2n = 2): one-word context
  • Trigram (n=3n = 3): two-word context

Example sentence: This is Big Data AI Book

  • Unigrams: This | is | Big | Data | AI | Book
  • Bigrams: This is | is Big | Big Data | Data AI | AI Book
  • Trigrams: This is Big | is Big Data | Big Data AI | Data AI Book

Q17. Explain Bigram probability computation, advantages, and limitations of N-gram models.

Using the chain rule, the probability of a sentence is:

P(w1wn)=i=1nP(wiw1wi1)P(w_1 \dots w_n) = \prod_{i=1}^{n} P(w_i \mid w_1 \dots w_{i-1})

With the Markov assumption, this is approximated as:

P(wiwi1)P(w_i \mid w_{i-1})

Bigram example using the training corpus:

<s> I am Sam </s>
<s> Sam I am </s>

The probability of "I am Sam" is computed as:

P(Is)×P(amI)×P(Samam)×P(/sSam)P(I \mid \langle s \rangle) \times P(am \mid I) \times P(Sam \mid am) \times P(\langle /s \rangle \mid Sam)
AspectDetails
AdvantagesSimple, intuitive, easy to implement, extendable to higher-order models
LimitationsZero-probability problem for unseen sequences, underflow from multiplying small probabilities
Common SolutionsLaplace (add-one) smoothing, log probabilities

On this page