Spam Classification With Naive Bayes

back

Naive Bayes is a probabilistic algorithm that is based on the application of conditional probabilities. It has the benefits of being easy to implement and fast to train. It has interesting applications within NLP.

Guess The Person

Let’s say we are in an office and there are two people: Alex and Brenda. They are both there the same amount of time. We glimpse somebody and want to infer who it might be. Based on the fact that all we know is they they are both present equal amounts of time, the probability we saw each person is 0.5.

When we saw the person running by, we saw they were wearing a red sweater. We know that Alex wears red twice a week while Brenda wears red three times a week. With this information, we can conclude that seeing someone in red would yield a 0.4 probability it was Alex and 0.6 probability it was Brenda.

The prior probability is the probability based on what is known before getting new information.

The posterior probability is the new probability after gaining new information.

Known and Inferred

Bayes theorem switches from what we know to what we infer. When we know the probabilities of Alex and Brenda wearing red, we are able to infer the probabilities of seeing Alex and Brenda given that we saw someone wearing red.

Guess the Person Now

Let’s now adjust our prior probabilities. We take a closer look at when Alex and Brenda are actually in the office. We now find that Alex is in the office 3 days a week while Brenda is only in the office one day a week.

This leads us to adjust our prior probabilities from 0.5 for both Alex and Brenda to 0.75 for Alex and 0.25 for Brenda.

We can now calculate the probabilities of seeing Alex and Brenda given we saw someone in red.

P(A) * P(R A) = 0.75 * 0.4 = 0.3
P(B) * P(R B) = 0.25 * 0.6 = 0.15
P(A R) = 0.3 / (0.3 + 0.15) = 0.67
P(B R) = 0.15 / (0.3 + 0.15) = 0.33

Bayes Theorem

Now that we’ve worked through the intuition behind Baye’s Theorem, let’s define it formally.

P(A R) = ( P(R A) * P(A) ) / ( P(R A) * P(A) + P(R B) * P(B) )
P(B R) = ( P(R B) * P(B) ) / ( P(R A) * P(A) + P(R B) * P(B) )

Bayesian Learning

Bayesian Learning is simply the repeated process of applying Bayes rule again and again.

Naive Bayes Algorithm

The word “naive” in Naive Bayes comes from the assumption that the events A and B are independent which allows us to calculate P(A&B) by simply multiplying P(A) and P(B).

We also assume that we can estimate the conditional probability of P(A B) is proportional to P(B A)*P(A)
This allows us to take P(observation A&B) proportional to P(A&B observation) * P(observation) and treat as proportional to P(A observation) * P(B observation) * P(observation).

© 2019. All rights reserved.