How to calculate variance inflation factor (VIF) in multivariate analysis, that influences multicollinearity?

Multicollinearity, which occurs when there is strong correlation between the variables, cause serious problems in multivariate analysis. One of the indicators of multicollinearity is variance inflation factor (VIF). It’s threshold is 10. When VIF would be 10 or larger, the impact of multicollinearity could be strong, therefore the variable should be removed.

  • Regression equation changes significantly when you add or remove a small number of data
  • Regression equation changes significantly when you applied to different data set
  • Sign of the regression coefficient is opposite to the common sense of the field
  • Although the contribution rate of regression equation is high and the model fitting is good, individual regression coefficients is not significant
  • Regression equation can not be obtained

When you encountered the phenomenon above list, you should consider multicollinearity. It’s assumed that one variable is objective and other variables are explanatory variables. You could calculate multiple correlation coefficient (R-square) by linear regression analysis with such spreadsheet software as EXCEL, and calculate VIF as following formula.

\displaystyle VIF = \frac{1}{1 - R^2}

VIF measures the impact of multicollinearity among the X’s in a regression model on the precision of estimation. It expresses the degree to which multicollinearity amongst the predictors degrades the precision of an estimate. VIF is a statistic used to measure possible multicollinearity amongst the predictor or explanatory variables. VIF is computed as (1/(1-R2)) for each of the k – 1 independent variable equations. For example, given 4 independent predictor variables, the independent regression equations are formed by using each k-1 independent variable as the dependent variable:
X1 = X2 X3 X4
X2 = X1 X3 X4
X3 = X1 X2 X4
Each independent variable model will return an R2 value and VIF value. The term to exclude in the model is then based on the value of VIF. If Xj is highly correlated with the remaining predictors, its variance inflation factor will be very large. A general rule is that the VIF should not exceed 10 (Belsley, Kuh, & Welsch, 1980). When Xj is orthogonal to the remaining predictors, its variance inflation factor will be 1.

Clearly the shortcomings just mentioned in regard to the use of R as a diagnostic measure for collinearity would seem also to limit the usefulness of R-1 , and this is the case. The prevalence of this measure, however, justifies its separate treatment. Recalling that we are currently assuming the X data to be centered and scaled for unit length, we are considering R-1 = (XTX)-1. The diagonal elements of R-1, the rii, are often called the variance inflation factors, VIFi, [Chatterjee and Price (1077)], and their diagnostic value follows from the relation
\displaystyle VIF_i = \frac{1}{1-R^2_i}
where Ri2 is the multiple correlation coefficient of Xi regressed on the remaining explanatory variables. Clearly a high VIF indicates an Ri2 near unity, and hence points to collinearity. This measure is therefore of some use as an overall indication of collinearity. Its weakness, like those of R, lie in its inability to distinguish among several coexisting near dependencies and in the lack of a meaningful boundary to distinguish between values of VIF that can be considered high and those that can be considered low. [Belsley]

References:
Cecil Robinson and Randall E. Schumacker: Interaction Effects: Centering, Variance Inflation Factor, and Interpretation Issues. Multiple Linear Regression Viewpoints, 2009, Vol. 35(1)
Belsley, D. A.: Demeaning conditioning diagnostics through centering: The American Statistics 1984; 38: 73 – 82

多変量解析における変数間の多重共線性を variance inflation factor (VIF) で検証する

 多重共線性は変数間に相関がある場合や線形関係が成立している時に発生し,多変量解析において回帰式が不安定になるなどの様々な問題を引き起こします.その指標の一つとして variance inflation factor (VIF) があり,その値が 10 以上になると多重共線性の影響が強くなるため,その変数は除去して解析すべきです.

  • 少数のデータを追加・削除しただけで回帰式が大きく変化する
  • 異なるデータに適用すると回帰式が大きく変化する
  • 回帰係数の符号がその分野の常識と逆になる
  • 回帰式の寄与率が高くモデルの適合度も良好であるが,ここの回帰係数が有意にならない
  • 回帰式が求まらない

 上記の現象が起きた際には多重共線性の存在を疑います.SPSS では通常の線形回帰分析で統計量オプションから共線性の診断をチェックして VIF を求めることができますが,表計算ソフトでも求めることができます.一つの変数を目的変数とし,他の全ての変数を説明変数として回帰分析を実行し,求まった重相関係数 R2 を用いて下式で求めます.

\displaystyle VIF = \frac{1}{1 - R^2}

VIF measures the impact of multicollinearity among the X’s in a regression model on the precision of estimation. It expresses the degree to which multicollinearity amongst the predictors degrades the precision of an estimate. VIF is a statistic used to measure possible multicollinearity amongst the predictor or explanatory variables. VIF is computed as (1/(1-R2)) for each of the k – 1 independent variable equations. For example, given 4 independent predictor variables, the independent regression equations are formed by using each k-1 independent variable as the dependent variable:
X1 = X2 X3 X4
X2 = X1 X3 X4
X3 = X1 X2 X4
Each independent variable model will return an R2 value and VIF value. The term to exclude in the model is then based on the value of VIF. If Xj is highly correlated with the remaining predictors, its variance inflation factor will be very large. A general rule is that the VIF should not exceed 10 (Belsley, Kuh, & Welsch, 1980). When Xj is orthogonal to the remaining predictors, its variance inflation factor will be 1.

Clearly the shortcomings just mentioned in regard to the use of R as a diagnostic measure for collinearity would seem also to limit the usefulness of R-1 , and this is the case. The prevalence of this measure, however, justifies its separate treatment. Recalling that we are currently assuming the X data to be centered and scaled for unit length, we are considering R-1 = (XTX)-1. The diagonal elements of R-1, the rii, are often called the variance inflation factors, VIFi, [Chatterjee and Price (1077)], and their diagnostic value follows from the relation
\displaystyle VIF_i = \frac{1}{1-R^2_i}
where Ri2 is the multiple correlation coefficient of Xi regressed on the remaining explanatory variables. Clearly a high VIF indicates an Ri2 near unity, and hence points to collinearity. This measure is therefore of some use as an overall indication of collinearity. Its weakness, like those of R, lie in its inability to distinguish among several coexisting near dependencies and in the lack of a meaningful boundary to distinguish between values of VIF that can be considered high and those that can be considered low. [Belsley]

参照:
Cecil Robinson and Randall E. Schumacker: Interaction Effects: Centering, Variance Inflation Factor, and Interpretation Issues. Multiple Linear Regression Viewpoints, 2009, Vol. 35(1)
Belsley, D. A.: Demeaning conditioning diagnostics through centering: The American Statistics 1984; 38: 73 – 82