Multicollinearity, which occurs when there is strong correlation between the variables, cause serious problems in multivariate analysis. One of the indicators of multicollinearity is variance inflation factor (VIF). It’s threshold is 10. When VIF would be 10 or larger, the impact of multicollinearity could be strong, therefore the variable should be removed.
- Regression equation changes significantly when you add or remove a small number of data
- Regression equation changes significantly when you applied to different data set
- Sign of the regression coefficient is opposite to the common sense of the field
- Although the contribution rate of regression equation is high and the model fitting is good, individual regression coefficients is not significant
- Regression equation can not be obtained
When you encountered the phenomenon above list, you should consider multicollinearity. It’s assumed that one variable is objective and other variables are explanatory variables. You could calculate multiple correlation coefficient (R-square) by linear regression analysis with such spreadsheet software as EXCEL, and calculate VIF as following formula.
VIF measures the impact of multicollinearity among the X’s in a regression model on the precision of estimation. It expresses the degree to which multicollinearity amongst the predictors degrades the precision of an estimate. VIF is a statistic used to measure possible multicollinearity amongst the predictor or explanatory variables. VIF is computed as (1/(1-R2)) for each of the k – 1 independent variable equations. For example, given 4 independent predictor variables, the independent regression equations are formed by using each k-1 independent variable as the dependent variable:
X1 = X2 X3 X4
X2 = X1 X3 X4
X3 = X1 X2 X4
Each independent variable model will return an R2 value and VIF value. The term to exclude in the model is then based on the value of VIF. If Xj is highly correlated with the remaining predictors, its variance inflation factor will be very large. A general rule is that the VIF should not exceed 10 (Belsley, Kuh, & Welsch, 1980). When Xj is orthogonal to the remaining predictors, its variance inflation factor will be 1.
Clearly the shortcomings just mentioned in regard to the use of R as a diagnostic measure for collinearity would seem also to limit the usefulness of R-1 , and this is the case. The prevalence of this measure, however, justifies its separate treatment. Recalling that we are currently assuming the X data to be centered and scaled for unit length, we are considering R-1 = (XTX)-1. The diagonal elements of R-1, the rii, are often called the variance inflation factors, VIFi, [Chatterjee and Price (1077)], and their diagnostic value follows from the relation
where Ri2 is the multiple correlation coefficient of Xi regressed on the remaining explanatory variables. Clearly a high VIF indicates an Ri2 near unity, and hence points to collinearity. This measure is therefore of some use as an overall indication of collinearity. Its weakness, like those of R, lie in its inability to distinguish among several coexisting near dependencies and in the lack of a meaningful boundary to distinguish between values of VIF that can be considered high and those that can be considered low. [Belsley]
References:
Cecil Robinson and Randall E. Schumacker: Interaction Effects: Centering, Variance Inflation Factor, and Interpretation Issues. Multiple Linear Regression Viewpoints, 2009, Vol. 35(1)
Belsley, D. A.: Demeaning conditioning diagnostics through centering: The American Statistics 1984; 38: 73 – 82