One common warning you may encounter in R is:
glm.fit: algorithm did not converge
This warning often occurs when you attempt to fit a logistic regression model in R and you experience perfect separation – that is, a predictor variable is able to perfectly separate the response variable into 0’s and 1’s.
The following example shows how to handle this warning in practice.
How to Reproduce the Warning
Suppose we attempt to fit the following logistic regression model in R:
#create data frame
df frame(x=c(.1, .2, .3, .4, .5, .6, .7, .8, .9, 1, 1, 1.1, 1.3, 1.5, 1.7),
y=c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1))
#attempt to fit logistic regression model
glm(y~x, data=df, family="binomial")
Call: glm(formula = y ~ x, family = "binomial", data = df)
Coefficients:
(Intercept) x
-409.1 431.1
Degrees of Freedom: 14 Total (i.e. Null); 13 Residual
Null Deviance: 20.19
Residual Deviance: 2.468e-09 AIC: 4
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
Notice that we receive the warning message: glm.fit: algorithm did not converge.
We receive this message because the predictor variable x is able to perfectly separate the response variable y into 0’s and 1’s.
Notice that for every x value less than 1, y is equal to 0. And for every x value equal to or greater than 1, y is equal to 1.
The following code shows a scenario where the predictor variable is not able to perfectly separate the response variable into 0’s and 1’s:
#create data frame
df frame(x=c(.1, .2, .3, .4, .5, .6, .7, .8, .9, 1, 1, 1.1, 1.3, 1.5, 1.7),
y=c(0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1))
#fit logistic regression model
glm(y~x, data=df, family="binomial")
Call: glm(formula = y ~ x, family = "binomial", data = df)
Coefficients:
(Intercept) x
-2.112 2.886
Degrees of Freedom: 14 Total (i.e. Null); 13 Residual
Null Deviance: 20.73
Residual Deviance: 16.31 AIC: 20.31
We don’t receive any warning message because the predictor variable is not able to perfectly separate the response variable into 0’s and 1’s.
How to Handle the Warning
If we encounter a scenario with perfect separation, there are two ways to handle it:
Method 1: Use penalized regression.
One option is to use some form of penalized logistic regression such as lasso logistic regression or elastic-net regularization.
Refer to the glmnet package for options on how to implement penalized logistic regression in R.
Method 2: Use the predictor variable to perfectly predict the response variable.
If you suspect that this perfect separation may exist in the population, you can simply use that predictor variable to perfectly predict the value of the response variable.
For example, in the above scenario we saw that the response variable y was always equal to 0 when the predictor variable x was less than 1.
If we suspect that this relationship holds in the overall population, we can just always predict that the value of y will be equal to 0 when x is less than 1 and not worry about fitting some penalized logistic regression model.
Additional Resources
The following tutorials offer additional information on working with the glm() function in R:
The Difference Between glm and lm in R
How to Use the predict function with glm in R
How to Handle: glm.fit: fitted probabilities numerically 0 or 1 occurred