12月, 2012 | Improve Society

Student’s t-test

When you would like to compare two averages between samples, you could use such formulas of Student’s t-test as following chart according to the number and the variances between dependent or independent samples.

Paired t-test as below;

$\displaystyle t = \frac{\bar X_D - \mu_0}{SD/\sqrt{n}} \cdot\cdot\cdot(1)$

Independent two samples t-test with unequal sample sizes and unequal variances, called as Welch’s t-test, as below;

$\displaystyle t = \frac{\bar X_1 - \bar X_2}{SD_p} \cdot\cdot\cdot(2) \vspace{0.2in}\\ SD_p = \sqrt{\frac{SD_1^2}{n_1} + \frac{SD_2^2}{n_2}}$

Independent two samples t-test with equal sample sizes and unequal variances as below;

$\displaystyle t = \frac{\bar X_1 - \bar X_2}{SD_p \sqrt{\frac{2}{n}}} \cdot\cdot\cdot(3) \vspace{0.2in}\\SD_p = \sqrt{\frac{SD_1^2 + SD_2^2}{2}}$

Independent two samples t-test with unequal sample sizes and equal variances as below;

$\displaystyle t = \frac{\bar X_1 - \bar X_2}{SD_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \cdot\cdot\cdot(4) \vspace{0.2in}\\ SD_p = \sqrt{\frac{(n_1 -1)SD_1^2 + (n_2 - 1)SD_2^2}{n_1 + n_2 - 2}}$

Reference:Student’s t-test

Student t検定

　Student t検定として知られる平均値の検定は下図のフローチャートに従ってどの検定を行うか決めます．

　対応のあるt検定は以下のように求まります．

$\displaystyle t = \frac{\bar X_D - \mu_0}{SD/\sqrt{n}} \cdot\cdot\cdot(1)$

　サンプルサイズが異なり分散も異なる独立した2群の平均値を比較する場合の t 統計値は次式で求まります．Welch検定とも言います．

$\displaystyle t = \frac{\bar X_1 - \bar X_2}{SD_p} \cdot\cdot\cdot(2) \vspace{0.2in}\\ SD_p = \sqrt{\frac{SD_1^2}{n_1} + \frac{SD_2^2}{n_2}}$

　同数かつ等分散の独立した2群の平均値を比べる場合の t 統計値は次式で求まります．

$\displaystyle t = \frac{\bar X_1 - \bar X_2}{SD_p \sqrt{\frac{2}{n}}} \cdot\cdot\cdot(3) \vspace{0.2in}\\SD_p = \sqrt{\frac{SD_1^2 + SD_2^2}{2}}$

　サンプルサイズが異なり等分散の独立した2群の平均値を比較する場合の t 統計値は次式で求まります．

$\displaystyle t = \frac{\bar X_1 - \bar X_2}{SD_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \cdot\cdot\cdot(4) \vspace{0.2in}\\ SD_p = \sqrt{\frac{(n_1 -1)SD_1^2 + (n_2 - 1)SD_2^2}{n_1 + n_2 - 2}}$

参照：Student’s t-test

t-test on independent groups with unequal variance

Sample average from population which follows normal distribution also follows it. When standard deviation of population is not known, you would have to speculate it with standard deviation of sample. T-statistics follows t-distribution, not normal distribution.

$\displaystyle t = \frac{\bar X - \mu}{SD/\sqrt n} = \frac{\bar X - \mu}{SE} \vspace{0.2in}\\$

SD: standard deviation; SE: standard error

When you would like to compare average values between separate groups with unequal variance, you could calculate t-statistics with formula below;

$\displaystyle t = \frac{(\bar X_2 - \bar X_1)}{SD_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} = \frac{(\bar X_2 - \bar X_1)}{SE}\vspace{0.2in}\\ SD_p = \sqrt{\frac{(n_1 - 1)SD_1^2 + (n_2 -1)SD_2^2}{n_1 + n_2 - 2}}$

$SD_p$ ; pooled SD

When t-statistics is greater than a value, null hypothesis is rejected. In one sided test, when it is greater than the value which area under t-distribution curve is smaller than 0.05, it is statistically significant. In two sided test, when it is greater than the value which area under curve is smaller than 0.025, it is statistically significant. T-statistics follows degree of freedom.

Reference:t-distribution

異分散の独立した2群のt検定

　正規分布に従う母集団からの標本平均値の分布は正規分布に従いますが，母集団の標準偏差 $\sigma$ が未知の場合，サンプルの標準偏差から推測する必要があります．その場合，t 統計値は正規分布ではなく t 分布に従います．

$\displaystyle t = \frac{\bar X - \mu}{SD/\sqrt n} = \frac{\bar X - \mu}{SE} \vspace{0.2in}\\$

　異分散の独立した2群の平均値を比べる場合の t 統計値は次式で求まります．

　t 統計値が t 分布上である値を超えると帰無仮説を棄却します．片側検定の場合，t 分布曲線下の面積が0.05以下になる点を超えれば統計学的有意と判定します．両側検定の場合は0.025以下になる点を超えれば統計学的有意と判定します．統計学的有意となる t 統計値は自由度，つまり標本数により変化します．詳細は下記リンクを参照して下さい．

参照：t分布

How to execute χ-square test with cross tabulation?

You can execute $\chi^2$ test with cross tabulation by such formula as below. In each cells, subtract expected value (E) from observed value (O), square the subtraction, divide the squared by expected value and add them all.

$\displaystyle\chi^2(df)=\sum\frac{(O-E)^2}{E}$

df: degree of freedom

$\chi^2$ statistics follows $\chi^2$ distribution. When degree of freedom is 1, $\chi^2$ statistics is 3.841 if probability is smaller than 0.05 in one sided test, $\chi^2$ is 6.635 if p < 0.01, [latex]\chi^2[/latex] is 10.828 if p < 0.001, respectively. In two-tailed test, [latex]\chi^2[/latex] is 5.024 if p < 0.05, [latex]\chi^2[/latex] is 7.879 if p < 0.01, respectively.

	TRUE	FALSE	Marginal total
POSITIVE	a	b	a + b
NEGATIVE	c	d	c + d
Marginal total	a + c	b + d	N

$\displaystyle \begin{array}{rcl}\chi^2&=&(ad-bc)^2\times\frac{N}{(a+b)(c+d)(a+c)(b+d)}\vspace{0.2in}\\\chi^2(Yates)&=&\left(|ad-bc|-\frac{1}{2}\right)^2\times\frac{N}{(a+b)(c+d)(a+c)(b+d)}\end{array}$

四分表（クロス表）からχ二乗検定を行う

　四分表では下記の式でχ二乗統計値を求めるのが一般的ですが，名義変数やアウトカムの取る値が3以上の場合でもχ二乗検定を行うことは可能です．χ二乗統計値とは，全てのセルにおいて観察値 O と期待値 E の差を二乗した値を期待値 E で除し，それらを合計した値のことです．

$\displaystyle\chi^2(df)=\sum\frac{(O-E)^2}{E}$

df: degree of freedom

　四分表（クロス表）が下記のようである場合，χ二乗統計値は次の通りです．χ二乗統計値はχ二乗分布に従い，自由度1の場合，片側検定で p < 0.05 となるχ二乗統計値は 3.841, p < 0.01 だと 6.635, p < 0.001 だと 10.828 です．両側検定で p < 0.05 となるχ二乗統計値は 5.024, P < 0.01 だと 7.879 です．

	TRUE	FALSE	Marginal total
POSITIVE	a	b	a + b
NEGATIVE	c	d	c + d
Marginal total	a + c	b + d	N

月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31