Naive Bayes Algorithm in Machine Learning

Naive Bayes Classifier Algorithm

Classifier is a machine learning algorithm that categorizes data into one or more of a set of “classes.” Email classifier is one example of a classifier that scans emails to filter them by class label: Spam or Not Spam.

Naive Bayes Classifier in machine learning is a Supervised machine learning for classification tasks.

Naive Bayes is used for solving classification problems. It predicts on the basis of the probability of an object. Naive Bayes is based on Bayes Theorem and it is used for text classification mostly. Naive Bayes is a probabilistic classification algorithm that is easy to implement and fast to train.

Since the Naive Bayes classifier is based on Bayes theorem, so it is known as a probability classifier. It predicts based on the probability of an item.

Reason to be called as Naïve Bayes?

Naive Bayes classifier has two words: Naive and Bayes. Why Navie? This algorithm treats all word sentences as the same. For example “You are” and “Are you” are the same for this algorithm. It is not dependent on features or occurrence of features. If you want to identify the fruit Apple then you need color red, shape spherical, taste sweet to recognize as an Apple. It means these features are independent of each other.

Naive Bayes classifier assumes that the features are independent of each other. Since this is rarely possible in real-life data, so the classifier is called naive.
This classification algorithm is based on Bayes theorem so known as Naive Bayes Classifier.

Naïve Bayes’ Theorem

Bayes’ theorem is used to find the probability of a hypothesis with conditional probabilities dependent on prior knowledge. This theorem is named after Thomas Bayes. Naive Bayes classifier works on the principle of conditional probability, as given by Bayes theorem.

To understand Bayes’ theorem, let us look at a simple naive bayes classifier example of tossing two coins. We can get these sample spaces by tossing two coins: {HH, HT, TH, TT}. So, the probabilities of these events will be:

Getting two heads = 1/4
At least one tail = 3/4
Second coin being head given the first coin is tail = 1/2
Getting two heads given the first coin is a head = 1/2

Bayes’ theorem calculates the probability of an event happening based on the probability of a different event that has already taken place. The formula for Bayes theorem is given as:

P(A|B) = (P(B|A) * P(A)) / P(B)

P(A|B) Probability of even A when even B has already occurred. Probability P(B) should not be zero.

You need to find the probability of event A, which is given when event B (evidence) is true.
P(A) is a priori (a priori, i.e. the probability of an observed event before proof) of A. Here, event B is the value of an unknown instance.
P(A|B) is the posterior probability of event B, i.e. the probability of the event after looking at the evidence.

Working example of Naïve Bayes’ Classifier

Let us take an example of shopping to understand the working of Bayes Naive Classifier. In this dataset, there is a small sample dataset of 30 rows for this example.

Dataset

Problem is to predict whether a person will buy a product on a specific combination of Day, Discount and Free Delivery using Naive Bayes Theorem.

Step 1) We will create frequency tables for each attribute using the input types mentioned in the dataset, such as days, discount, and free delivery.

Let the event ‘Buy’ denoted as ‘A’, and independent variables, namely ‘Discount’, ‘Free delivery’, and ‘Day’, denoted as ‘B’. We will use these events and variables to apply Bayes’ theorem.

Step 2) Now let us calculate the Likelihood tables one by one.

Example 1:

Based on this likelihood table, we will calculate the conditional probabilities as below.

P(A) = P(No Buy) = 6/30 = 0.2
P(B) = P(Weekday) = 11/30 = 0.37
P(B/A) = P(Weekday / No Buy) = 2/6 = 0.33

And, find P(A/B) using Bayes theorem,

P(A/B)
= P(No Buy / Weekday)
= P(Weekday / No Buy) * P(No Buy) / P(Weekday)
= (2/6 * 6/30) / (11/30)
= 0.1818

Similarly, if A is Buy, then

= P(Buy / Weekday)
= P(Weekday / Buy) * P(Buy) / P(Weekday)
= (9/24 * 24/30) / (11/30)
= 0.8181

Note: As the P(Buy | Weekday) is more than P(No Buy | Weekday), we can conclude that a customer will most likely buy the product on a Weekday.

Step 3) Similarly, we can calculate the likelihood of occurrence of an event on the basis of all the three variables. Now we will calculate Likelihood tables for all three variables using above frequency tables.

Example 2:

Now, using these three Likelihood tables, we will calculate whether a customer is likely to make a purchase based on a specific combination of ‘Day’, ‘Discount’ and ‘Free delivery’.

Here, let us take a combination of these factors:

Day = Holiday
Discount = Yes
Free Delivery = Yes

When, A = Buy

Calculate the conditional probability of purchase on the following combination of day, discount and free delivery.

Where B is:

Day = Holiday
Discount = Yes
Free Delivery = Yes

And A = Buy

Therefore,

= P(A/B)
= P(Buy / Discount=Yes, Day=Holiday, Free Delivery=Yes)
= ( P(Discount=(Yes/Buy)) * P(Free Delivery=(Yes/Buy)) * P(Day=(Holiday/Buy)) * P(Buy) )
/ ( P(Discount=Yes) * P(Free Delivery=Yes) * P(Day=Holiday) )
= (19/24 * 21/24 * 8/24 * 24/30) / (20/30 * 23/30 * 11/30)
= 0.986

When, A = No Buy

Similarly, Calculate the conditional probability of purchase on the following combination of day, discount and free delivery.

Where B is:

Day = Holiday
Discount = Yes
Free Delivery = Yes

And A = No Buy

Therefore,

= P(A/B)
= P(No Buy / Discount=Yes, Day=Holiday, Free Delivery=Yes)
= ( P(Discount=(Yes/No Buy)) * P(Free Delivery=(Yes/No Buy)) * P(Day=(Holiday/No Buy)) * P(No Buy) )
/ ( P(Discount=Yes) * P(Free Delivery=Yes) * P(Day=Holiday) )
= (1/6 * 2/6 * 3/6 * 6/30) / (20/30 * 23/30 * 11/30)
= 0.027

Step 4) Hence,

Probability of purchase = 0.986

Probability of no purchase = 0.027

Finally, we have conditional probabilities to buy on this day. Let us now generalize these probabilities to obtain the Likelihood of the events.

Sum of probabilities = 0.986 + 0.027 = 1.013
Likelihood of purchase = 0.986 / 1.013 = 97.33 %
Likelihood of No purchase = 0.027 / 1.013 = 2.67 %

Note that, as 97.33% is greater than 2.67%. We can conclude that the average customer will buy on a holiday with a discount and free delivery.

Types of Naïve Bayes Model

There are many types of Naive Bayes Classifiers. Here we have discussed Multinomial, Bernoulli and Gaussian Naive Bayes classifiers.

1. Multinomial Naive Bayes

This type of Naive Bayes model is used for document classification problems. It works with features that represent the frequency of words in a document. The classifier considers the occurrence and count of words to determine the probability of a document belonging to a specific category, such as sports, politics, or technology.

2. Bernoulli Naive Bayes

This is similar to the multinomial Naive Bayes. Bernoulli Naive Bayes classifier is used for document classification tasks. However, it uses boolean predictors. It represents whether a word is present or not and takes only values Yes or No. The classifier calculates the probabilities based on whether a word occurs in the text or not.

3. Gaussian Naive Bayes

This classifier is used in case of continuous value but not discrete value. This classifier calculates probabilities using the parameters of the Gaussian distribution, i.e., mean and variance.

The formula for conditional probability changes to,

Benefits and Limitations of Naive Bayes Classifier

There are various advantages and disadvantages of the Naive Bayes algorithm in machine learning.

Benefits of Naive Bayes Classifier

Simplicity and Efficiency: Naive Bayes is simple and easy to train and implement. It is efficient because of low computational cost. It can handle large datasets efficiently.
Fast Training and Prediction: Naive Bayes does not require as much training data because of independence between features. It can predict fast once the model is trained.
Scalability: Naive Bayes can handle high-dimensional datasets with a large number of features. It performs well even when the number of features is greater than the number of training examples. It scales with the number of data points and predictors. It handles both continuous and discrete data.
Robustness to Irrelevant Features: It is not sensitive to irrelevant features.
Works well with Small Training Sets: Naive Bayes can provide reasonable results even with limited training data. It can handle situations where the number of training instances is small. It doesn’t require as much training data.

Limitation of Naive Bayes Classifier

Naive Bayes in machine learning assumes that all features are independent of each other. So, it cannot learn relationships between different features in the data. It treats each feature as if it has no relation with the others.

To overcome this problem, you can use Decision Trees, Random Forests, Support Vector Machines (SVM), Neural Networks etc. These algorithms have the ability to learn complex relationships and dependencies between features in the data. So these can predict more accurate results.

Applications of Naive Bayes Classifier

Since this algorithm is fast and efficient, you can use it to make real-time predictions.

Spam Detection

Email services (such as Gmail) use this algorithm to determine whether an email is spam. This algorithm is excellent for spam filtering.

Sentiment Analysis

It can classify text as positive, negative, or neutral based on features like word choice, sentence structure, and context. It finds applications in social media monitoring, customer reviews, and market research.

Document Classification

It can classify documents into categories such as sports, politics, technology, or finance based on the frequency or presence of specific words or features within the document.

Recommender Systems

It can analyze user preferences, historical data, and item features to predict user interests or preferences for recommending products, movies, or articles.

This classifier algorithm is also used in Face Recognition, Weather Prediction, Medical Diagnosis, Shopping, News Classification etc. You can implement Naive Bayes in Python. There is Naive Bayes classifier sklearn, i.e., sklearn.naive_bayes. It is module which implements this algorithm.

Conclusion

Naive Bayes algorithms in machine learning are classifiers mostly used in spam detection, news classification, sentiment analysis, weather prediction, shopping etc. The Naive Bayes algorithms are based on Bayes Theorem. This algorithm is simple and easy to implement. Since it is fast we can use it in real-time applications. Its biggest disadvantage is that it assumes independent features (Since independent features are rarely possible in real-life, so it is known as Naive). It treats each feature as equal. To overcome this drawback you can use other classifiers like Decision Tree, Random Forest, Support Vector Machine (SVM) etc.