When we have a set of predictor variables and we’d like to classify a response variable into one of two classes, we typically use logistic regression.
However, when a response variable has more than two possible classes then we typically use linear discriminant analysis, often referred to as LDA.
LDA assumes that (1) observations from each class are normally distributed and (2) observations from each class share the same covariance matrix. Using these assumptions, LDA then finds the following values:
- μk: The mean of all training observations from the kth class.
- σ2: The weighted average of the sample variances for each of the k classes.
- πk: The proportion of the training observations that belong to the kth class.
LDA then plugs these numbers into the following formula and assigns each observation X = x to the class for which the formula produces the largest value:
Dk(x) = x * (μk/σ2) – (μk2/2σ2) + log(πk)
LDA has linear in its name because the value produced by the function above comes from a result of linear functions of x.
An extension of linear discriminant analysis is quadratic discriminant analysis, often referred to as QDA.
This method is similar to LDA and also assumes that the observations from each class are normally distributed, but it does not assume that each class shares the same covariance matrix. Instead, QDA assumes that each class has its own covariance matrix.
That is, it assumes that an observation from the kth class is of the form X ~ N(μk, Σk).
Using this assumption, QDA then finds the following values:
- μk: The mean of all training observations from the kth class.
- Σk: The covariance matrix of the kth class.
- πk: The proportion of the training observations that belong to the kth class.
QDA then plugs these numbers into the following formula and assigns each observation X = x to the class for which the formula produces the largest value:
Dk(x) = -1/2*(x-μk)T Σk-1(x-μk) – 1/2*log|Σk| + log(πk)
Note that QDA has quadratic in its name because the value produced by the function above comes from a result of quadratic functions of x.
LDA vs. QDA: When to Use One vs. the Other
The main difference between LDA and QDA is that LDA assumes each class shares a covariance matrix, which makes it a much less flexible classifier than QDA.
This inherently means it has low variance – that is, it will perform similarly on different training datasets. The drawback is that if the assumption that the K classes have the same covariance is untrue, then LDA can suffer from high bias.
QDA is generally preferred to LDA in the following situations:
(1) The training set is large.
(2) It’s unlikely that the K classes share a common covariance matrix.
When these conditions hold, QDA tends to perform better since it is more flexible and can provide a better fit to the data.
How to Prepare Data for QDA
Make sure your data meets the following requirements before applying a QDA model to it:
1. The response variable is categorical. QDA models are designed to be used for classification problems, i.e. when the response variable can be placed into classes or categories.
2. The observations in each class follow a normal distribution. First, check that each the distribution of values in each class is roughly normally distributed. If this is not the case, you may choose to first transform the data to make the distribution more normal.
3. Account for extreme outliers. Be sure to check for extreme outliers in the dataset before applying LDA. Typically you can check for outliers visually by simply using boxplots or scatterplots.
QDA in R & Python
The following tutorials provide step-by-step examples of how to perform quadratic discriminant analysis in R and Python:
Quadratic Discriminant Analysis in R (Step-by-Step)
Quadratic Discriminant Analysis in Python (Step-by-Step)