N-gram Language Models

LECTURE 12: LANGUAGE MODEL

Q16. Explain Language Models and N-Gram Language Models.

A language model is a statistical model that assigns probabilities to sequences of words. It estimates how likely a sentence is and can also predict the next word in a sequence.

Formally, for a sentence $W = (w_1, w_2, \dots, w_n)$ :

P(W) = P(w_1, w_2, \dots, w_n)

Language models are widely used in applications such as autocomplete, machine translation, speech recognition, handwriting recognition, spelling correction, and OCR.

An N-gram language model approximates these probabilities by considering only a limited context of the previous $n-1$ words (Markov assumption), making computation feasible for large datasets.

Types of N-grams:

Unigram ( $n = 1$ ): no context
Bigram ( $n = 2$ ): one-word context
Trigram ( $n = 3$ ): two-word context

Example sentence: This is Big Data AI Book

Unigrams: This | is | Big | Data | AI | Book
Bigrams: This is | is Big | Big Data | Data AI | AI Book
Trigrams: This is Big | is Big Data | Big Data AI | Data AI Book

Q17. Explain Bigram probability computation, advantages, and limitations of N-gram models.

Using the chain rule, the probability of a sentence is:

P(w_1 \dots w_n) = \prod_{i=1}^{n} P(w_i \mid w_1 \dots w_{i-1})

With the Markov assumption, this is approximated as:

P(w_i \mid w_{i-1})

Bigram example using the training corpus:

<s> I am Sam </s>
<s> Sam I am </s>

The probability of "I am Sam" is computed as:

P(I \mid \langle s \rangle) \times P(am \mid I) \times P(Sam \mid am) \times P(\langle /s \rangle \mid Sam)

Aspect	Details
Advantages	Simple, intuitive, easy to implement, extendable to higher-order models
Limitations	Zero-probability problem for unseen sequences, underflow from multiplying small probabilities
Common Solutions	Laplace (add-one) smoothing, log probabilities

N-gram Language Models

LECTURE 12: LANGUAGE MODEL

Q16. Explain Language Models and N-Gram Language Models.

Q17. Explain Bigram probability computation, advantages, and limitations of N-gram models.

On this page