What is a Confusion Matrix in Machine Learning? : A Complete Guide For Beginners

Last updated on 05th Nov 2022, Artciles, Blog

E-mail this post

(5.0) | 18574 Ratings 2199

In this article you will learn:

1.What Are the Confusion Matrices and Why Do Need Them?

2.How to Create a 2×2 Confusion Matrix?

3.Confusion Matrix Metrics.

4.Scaling a Confusion Matrix.

5.Confusion Matrix With Python.

6.Conclusion.

What Are the Confusion Matrices and Why Do Need Them?

Classification Models have the multiple categorical outputs. Most error measures will calculate a total error in a model but cannot find individual instances of errors in a model. The model might misclassify some of categories more than others but cannot see this using a standard accuracy measure.
Furthermore suppose there is the significant class imbalance in a given data. In that case i.e. a class has a more instances of data than the other classes a model might predict majority class for all the cases and have a high accuracy score when it is not a predicting the minority classes. This is where a confusion matrices are useful.
The confusion matrix is the matrix used to determine the performance of a classification models for a given set of a test data. It can only be determined if a true values for test data are known. The matrix itself can be an easily understood but are related terminologies may be confusing. Since it shows an errors in a model performance in the form of matrix hence also known as error matrix.
A confusion matrix presents the table layout of a various outcomes of a prediction and results of a classification problem and helps to visualize its outcomes.
It plots a table of all predicted and the actual values of a classifier.

How to Create a 2×2 Confusion Matrix?

Can obtain a four different combinations from a predicted and actual values of the classifier:

True Positive: The number of times are actual positive values are equal to a predicted positive. A predicted a positive value and it is be correct.

False Positive: The number of times are model wrongly predicts a negative values as positives. And predicted a negative value and it is an actually positive.

True Negative: The number of times are actual negative values are equal to be predicted negative values. A predicted a negative value, and it is an actually negative.

False Negative: The number of times are model wrongly predicts a negative values as positives. A predicted a negative value and it is an actually positive.

Confusion Matrix Metrics:

Consider the confusion matrix made for a classifier that classifies a people based on whether they speak a English or Spanish.

True Positives (TP) = 86
True Negatives (TN) = 79
False Positives (FP) = 12
False Negatives (FN) = 10

From the matrix, the performance of a model is not very clear. To find how accurate the model is, use the following metrics:

Accuracy: The accuracy is used to find a portion of correctly classified values. It tells us how often the classifier is right. It is a sum of all true values divided by a total values.

Precision: Precision is used to calculate a model’s ability to classify the positive values correctly. It is a true positives divided by a total number of a predicted positive values.

Recall: It is used to calculate a model’s ability to predict positive values. How often does a model predict a correct positive values. It is the true positives divided by a total number of the actual positive values.

F1-Score: It is harmonic mean of a Recall and Precision. It is useful when need to take a both Precision and Recall into an account.

Scaling a Confusion Matrix:

To scale a confusion matrix, increase a number of rows and columns. All True Positives will be along diagonal. The other values will be a False Positives or False Negatives.Now that are understand what a confusion matrix is and its be inner working let’s explore how find accuracy of a model with a hands-on demo on confusion matrix with the Python.

Confusion Matrix With Python:

Will build a logistic regression model using the heart attack dataset to predict if a patient is at a risk of heart attack.
Depicted below is a dataset that are be using for this demonstration.
Let’s import a necessary libraries to create a model.
Can import confusion matrix function from a sklearn.metrics. Let’s split a dataset into the input features and target output dataset.
As can see our data contains the massive range of values, some are the single digits, and some have three numbers. To make a calculations more straightforward, will scale a data and reduce it to a small range of the values using a Standard Scaler.
Now let’s split a dataset into the two: one to train a model and another to test a model. To do this, use a train_test_split imported from sklearn. Using the Logistic Regression Model will perform the Classification on a train data and predict a test data to check an accuracy.
To find accuracy of the confusion matrix and all other metrics and can import a accuracy_score and classification_report from a same library.
The accuracy_score gives us an accuracy of a classifier.
Using a predicted values(pred) and an actual values(y_test) can create a confusion matrix with a confusion_matrix function.
Then using a ravel() method of a confusion_matrix function and can get a True Positive, True Negative, False Positive and False Negative values.
Finally using a classification_report and can find the values of various metrics of a confusion matrix.

Conclusion:

In this article – The Best Guide to a Confusion Matrix, have looked at what a confusion matrix is and why are use a confusion matrices. Then looked at how to create the 2X2 confusion matrix and calculate a confusion matrix metrics using it.And took a look at how be confusion matrices can be scaled up to the include more than two classification classes and finally got hands-on experience with the confusion matrices by implementing them in a Python. In a machine learning, Classification is used to split a data into the categories. But the after cleaning and preprocessing a data and training our model how do know if our classification model performs well.A confusion matrix is used to measure a performance of a classifier in a depth.

Are you looking training with Right Jobs?

Hadoop Developer Training 11025 Learners
Apache Spark With Scala Training 12022 Learners
Apache Storm Training 11141 Learners

Request for Information

Name

Mobile

Select Course

What is a Confusion Matrix in Machine Learning? : A Complete Guide For Beginners

Related Articles

Popular Courses

Latest Articles

Request for Information

Trending Courses

Trending Blog Articles

CONTACT

COMPANY

WORK WITH US

TERMS & POLICIES

Velachery

Tambaram

OMR

Porur

Anna Nagar

T. Nagar

Adyar

Thiruvanmiyur

Siruseri

Maraimalai Nagar

BTM Layout

Marathahalli

Rajaji Nagar

Jaya Nagar

Kalyan Nagar

Electronic City

Indira Nagar

HSR Layout

Hyderabad

Pune