Part of Speech Tagging with HMMs
Intro
When we have some text, part of speech tagging is the act of assigning of the grammatical part of speech to each token in the text. This includes “noun”, “verb”, “determinant”, and “adjective” as well others.
Lookup Table
The simplest approach is to simply assign counts for each part of speech for each word in a table. We can then lookup the part of speech that is most common for a given word.
Bigrams
Due to the naivety of the lookup table it breaks very easily when a word can take on multiple parts of speech depending oin the context. For example “will” can be both a noun as in a person called “Will” and a modal verb as in “something will happen”.
The simplest way to take context into account is by combining words into bigrams (ie the closest neighbour of the word). We can then expand the lookup table to use the bigrams and part of speech pairs.
Of course, we aren’t limited to bigrams. This approach can be extended to use ngrams of any length.
When Bigrams Won’t Work
Bigram and ngram lookup tables quickly break of the data is sparse. If the ngram to lookup has never been seen before, there won’t be any data to call upon.
Hidden Markov Models
The idea for the hidden markov model is as follows. Given a potential part of speech sequence for a given string of text, how likely is the part of speech sequence. These probabilities are the transition probabilities. We also determine the probability of each part of speech being the assigned word. These are called the emission probabilities.
In order to begin and complete a sequence we need to add start and end tags and treat them as part of speech along with all of the standard tags.
The complete transition probability space can be constructed as a graph where each tag is a node and each transition between the tags carries a probability.
The combination of the probabilities of the words (observations) and transition graph (the nodes are referred to as the hidden states) is the hidden markov model.
Viterbi Algorithm
The Viterbi Algorithm is an application of dynamic programming. The algorithm finds the path through the network of possibly paths by calculating the probability of each possible path. We then start from the end and trace backwards selecting the previous step in the path with the largest probability.