Home » How to Calculate Matthews Correlation Coefficient in Python

How to Calculate Matthews Correlation Coefficient in Python

by Erma Khan

Matthews correlation coefficient (MCC) is a metric we can use to assess the performance of a classification model.

It is calculated as:

MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)

where:

  • TP: Number of true positives
  • TN: Number of true negatives
  • FP: Number of false positives
  • FN: Number of false negatives

This metric is particularly useful when the two classes are imbalanced – that is, one class appears much more than the other.

The value for MCC ranges from -1 to 1 where:

  • -1 indicates total disagreement between predicted classes and actual classes
  • 0 is synonymous with completely random guessing
  • 1 indicates total agreement between predicted classes and actual classes

For example, suppose a sports analyst uses a logistic regression model to predict whether or not 400 different college basketball players get drafted into the NBA.

The following confusion matrix summarizes the predictions made by the model:

To calculate the MCC of the model, we can use the following formula:

  • MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)
  • MCC = (15*375-5*5) / √(15+5)(15+5)(375+5)(375+5)
  • MCC = 0.7368

Matthews correlation coefficient turns out to be 0.7368. This value is somewhat close to one, which indicates that the model does a decent job of predicting whether or not players will get drafted.

The following example shows how to calculate MCC for this exact scenario using the matthews_corrcoef() function from the sklearn library in Python.

Example: Calculating Matthews Correlation Coefficient in Python

The following code shows how to define an array of predicted classes and an array of actual classes, then calculate Matthews correlation coefficient of a model in Python:

import numpy as np
from sklearn.metrics import matthews_corrcoef

#define array of actual classes
actual = np.repeat([1, 0], repeats=[20, 380])

#define array of predicted classes
pred = np.repeat([1, 0, 1, 0], repeats=[15, 5, 5, 375])

#calculate Matthews correlation coefficient
matthews_corrcoef(actual, pred)

0.7368421052631579

The MCC is 0.7368. This matches the value that we calculated earlier by hand.

Note: You can find the complete documentation for the matthews_corrcoef() function here.

Additional Resources

The following tutorials explain how to calculate other common metrics for classification models in Python:

An Introduction to Logistic Regression in Python
How to Calculate F1 Score in Python
How to Calculate Balanced Accuracy in Python

Related Posts