2013/01/25 | Improve Society

The index of relation between $x$ and $y$ is correlation coefficient or Pearson product moment correlation coefficient as formula below. Range of correlation coefficient is between -1 and 1.

$\displaystyle r = \frac{\sum_{i=1}^n(x_i-\bar x)(y_i - \bar y)}{\sqrt{\sum_{i=1}^n(x_i - \bar x)^2}\sqrt{\sum_{i=1}^n(y_i - \bar y)^2}}$

$\bar x$ and $\bar y$ are average of $x$ and $y$ , respectively. i is number of sample (incremental variable). n is number of sample.

Correlation coefficient (r) of 2 variables randomly extracted from population follows t-distribution. T-statistics of r is calculated formula as below and follows t-distribution with degree of freedom n-2, n is number of sample. When correlation coefficient of population is $\rho$ , null hypothesis is described that “ $\rho$ = 0″. If t-statistics calculated from number of sample (n) and correlation coefficient (r) is greater than that of significance level ( $\alpha$ ), null hypothesis is rejected.

$\displaystyle t = r\sqrt{\frac{n - 2}{1 - r^2}}$

The test of significance for this important null hypothesis H (ρ = 0) is equivalent to that for the null hypothesis H (β₁ = 0) or H (β₂ = 0). It now follows that if x and y have a joint bivariate normal distribution, then the test for the null hypothesis H (ρ = 0) is obtained by using the fact that if the null hypothesis under test is true, then
$\displaystyle F = \frac{(n-2)Z^2}{XY-Z^2} = \frac{(n-2)r^2}{1-r^2}\vspace{0.1in}\\ X = \sum(x - \bar{x})^2\vspace{0.1in}\\ Y = \sum(y - \bar{y})^2\vspace{0.1in}\\ Z = \sum(x - \bar{x})(y - \bar{y})\vspace{0.1in}\\ r^2 = \frac{Z^2}{XY}$
has the F distribution with 1, n – 2 d.f. An equivalent test of significance for the null hypothesis is obtained by using the fact that if the null hypothesis is true, then
$\displaystyle t = \frac{r\sqrt{n - 2}}{\sqrt{1 - r^2}}$
has “Student’s” distribution. with n – 2 d.f.

For any non-zero null hypothesis about ρ there is no parallelism between the correlation coefficient ρ and the regression coefficients β₁ and β₂. In fact, no exact test of significance is available for testing readily non-zero null hypothesis about ρ. Fisher has given an approximate method for such null hypothesis, but we do not consider this here.

　 $x$ と $y$ との間の相関の程度を表す指標としてピアソンの積率相関係数 r があり，下記の式で表現します．相関係数 r は -1 から 1 の範囲にあります．

$\displaystyle r = \frac{\sum_{i=1}^n(x_i-\bar x)(y_i - \bar y)}{\sqrt{\sum_{i=1}^n(x_i - \bar x)^2}\sqrt{\sum_{i=1}^n(y_i - \bar y)^2}}$

$\bar x$ and $\bar y$ are average of $x$ and $y$ , respectively. i is number of sample (incremental variable). n is number of sample.

　母集団から無作為に抽出したサンプルの2種類の変数の間の相関係数 r は t 分布に従いますので有意差検定を行うことができます．

　相関係数 r に対する t 統計値は下記の式で求まり，サンプル数 n とすると自由度 n-2 の t 分布に従います．母集団の相関係数を $\rho$ として帰無仮説を『相関係数 $\rho = 0$ である』とします．サンプル数 n, 相関係数 r から計算した t 統計値が，有意水準 $\alpha$ に対応する t 統計値を越えれば帰無仮説を棄却します．

$\displaystyle t = r\sqrt{\frac{n - 2}{1 - r^2}}$

　ピアソンの積率相関係数 r の有意差検定について以前質問をいただきましたが，参考文献を入手しましたので追記します．実は証明など期待していたのですが，期待はずれでした． $\rho$ についての帰無仮説を簡単に検定できる正確な方法は存在しない，と書いてありました．そのような帰無仮説に対して Fisher が近似的な方法を提供していると書かれていましたが，それ以上の記述はありませんでした．以下拙訳です．

　帰無仮説 H (ρ = 0) は次の帰無仮説と同等である．H (β₁ = 0) または H (β₂ = 0)．そこで仮に x, y が共同二変量正規分布を取る場合，仮に検定対象の帰無仮説が真であるとすると，帰無仮説 H (ρ = 0) のための検定が得られる．
$\displaystyle F = \frac{(n-2)Z^2}{XY-Z^2} = \frac{(n-2)r^2}{1-r^2}\vspace{0.1in}\\ X = \sum(x - \bar{x})^2\vspace{0.1in}\\ Y = \sum(y - \bar{y})^2\vspace{0.1in}\\ Z = \sum(x - \bar{x})(y - \bar{y})\vspace{0.1in}\\ r^2 = \frac{Z^2}{XY}$

　上記の F は自由度 n-2 の F 分布に従う．同等の有意差検定として，仮に帰無仮説が真だとすると以下の式は自由度 n-2 の Student’s-t 分布に従う．
$\displaystyle t = \frac{r\sqrt{n - 2}}{\sqrt{1 - r^2}}$

　ρ についてのいかなる帰無仮説でも相関係数 ρ と回帰係数 β₁ および β₂ との間に平行性は存在しない．事実 ρ についての帰無仮説を簡単に検定できる正確な方法は存在しない．そのような帰無仮説に対して Fisher が近似的な方法を提供しているが，ここでは取り扱わない．

月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

日: 2013年1月25日

Pearson product-moment correlation coefficient (r) and t-test on it

ピアソンの積率相関係数 r とそれに対する t 検定