Confusion Matrix in Machine Learning using sklearn

Confusion Matrix is a `2X2` matrix which is used to evaluate a machine learning model. It is used to measure the performance of the model. It also helps you to evaluate your machine learning model in a better way and makes it easy to calculate recall, precision, f1-score, ROC curves etc.

Before jumping in we need to know a few terms.

• True Positive
• False Positive
• True Negative
• False Negative

We will explain these terms in context of an example.

Let’s consider that we are creating a model which can detect if the person has COVID-19.

True Positive

If our model detects someone to be COVID-19 positive and the person really was positive as well.

False Positive

If our model detect someone to be COVID-19 positive but the person was not positive.

True Negative

If our model detect someone to be COVID-19 negative and the person was not positive.

False Negative

If our model detect someone to be COVID-19 negative but the person was positive.

The image shows the `confusion matrix` for a case where patient’s data was tested for COVID-19. Out of total 165 patients our model produced the following results.

True Positives: Our model predicted that 100 patients were carrying the disease, and they were actually carrying the disease.

True Negatives: Our model predicted that 50 patients were not carrying the disease, and they actually were not carrying it.

False Positives: Our model predicted that 10 patients were carrying the disease, and they actually were not carrying it. This is also known as Type-I error.

False Negatives: Our model predicted that 5 patients were not carrying the disease, and they actually were carrying it. This is also known as Type-II error.

We can go forward and calculate all the values for Accuracy, Recall, Precision and F1-Score from this confusion matrix using these values.

Using sklearn to print confusion Matrix

To print the confusion matrix of a model in `sklearn` use the following code.

``````from sklearn.metrics import confusion_matrix

print(confusion_matrix(y_test, predictions))

# where y_test is the data frame of test values
# and predictions are the model predicted values
``````

You can also print a matrix containing all values like `recall`, `precision` etc. using the following code,

``````from sklearn.metrics import classification_report

print(classification_report(y_test, predictions))
``````