Matthews correlation coefficient (MCC) is a metric we can use to assess the performance of a classification model.
It is calculated as:
MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)
where:
- TP: Number of true positives
- TN: Number of true negatives
- FP: Number of false positives
- FN: Number of false negatives
This metric is particularly useful when the two classes are imbalanced – that is, one class appears much more than the other.
The value for MCC ranges from -1 to 1 where:
- -1 indicates total disagreement between predicted classes and actual classes
- 0 is synonymous with completely random guessing
- 1 indicates total agreement between predicted classes and actual classes
For example, suppose a sports analyst uses a logistic regression model to predict whether or not 400 different college basketball players get drafted into the NBA.
The following confusion matrix summarizes the predictions made by the model:
To calculate the MCC of the model, we can use the following formula:
- MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)
- MCC = (15*375-5*5) / √(15+5)(15+5)(375+5)(375+5)
- MCC = 0.7368
Matthews correlation coefficient turns out to be 0.7368.
This value is somewhat close to one, which indicates that the model does a decent job of predicting whether or not players will get drafted.
The following example shows how to calculate MCC for this exact scenario using the mcc() function from the mltools package in R.
Example: Calculating Matthews Correlation Coefficient in R
The following code shows how to define a vector of predicted classes and a vector of actual classes, then calculate Matthews correlation coefficient using the mcc() function from the mltools package:
library(mltools) #define vector of actual classes actual rep(c(1, 0), times=c(20, 380)) #define vector of predicted classes preds rep(c(1, 0, 1, 0), times=c(15, 5, 5, 375)) #calculate Matthews correlation coefficient mcc(preds, actual) [1] 0.7368421
Matthews correlation coefficient is 0.7368.
This matches the value that we calculated earlier by hand.
If you’d like to calculate Matthews correlation coefficient for a confusion matrix, you can use the confusionM argument as follows:
library(mltools) #create confusion matrix conf_matrix 2) #view confusion matrix conf_matrix [,1] [,2] [1,] 15 5 [2,] 5 375 #calculate Matthews correlation coefficient for confusion matrix mcc(confusionM = conf_matrix) [1] 0.7368421
Once again, Matthews correlation coefficient is 0.7368
Additional Resources
The following tutorials explain how to perform other common tasks in R:
How to Perform Logistic Regression in R
How to Plot a ROC Curve Using ggplot2
How to Calculate F1 Score in R