我的研究需要收集多少資料才能得到有意義的結果?
我能運用的研究資源足夠收集可支持研究假設的資料嗎?
Null | Alternative |
---|---|
虛無假設 | 對立假設 |
No effect | A detectable effect |
假設存在的效果在測量尺度為0 | 假設存在的效果在測量尺度非0 |
實際的效果為0,結論宣稱非0 | 實際的效果非0,結論宣稱為0 |
我有興趣的是那個假設?
我了解想要測試的效果,有事前評估偵測到效果的條件嗎?
考驗力接近封頂(100%),任何統計檢定都能顯著。未設想虛無假設的背景,統計顯著即無意義。
以p value有沒有小於顯著水準,決定結果有沒有意義。忽略p value的連續性。
效果量與樣本數都會影響p value。過度簡化必定低估或高估p value的意義。
統計顯著不代表預期的效果真實存在。不提醒學習者很容易造成誤用統計檢定。
Type I error
實際的效果為0,根據分析結果宣稱非0
\(\alpha = p(\frac{d \neq 0}{\theta = 0})\)
Type II error
實際的效果非0,根據分析結果宣稱為0
\(\beta = p(\frac{d = 0}{\theta \neq 0})\)
證實必定有效的結果/否證實際不存在的效果所耗費的沈默成本?
效果量偏低,獲得統計顯著結果的益處是什麼?
現實資源能否支持\(\frac{\alpha}{\beta}\)平衡損益?
Cohen’s suggested balance ~ \(\frac{\alpha}{\beta} = \frac{1}{4}\)
\(\alpha < .05\), \(1 - \beta\)至少.80
“The notion that failure to find is less serious than finding something that is not there accords with the conventional scientific view” (Cohen, 1988)
Motivation for p-hacking
There is no effect (null=true) |
There is an effect (null=false) |
|
---|---|---|
We claim no effect (ES=0) |
Correct conclusion (\(1 - \alpha\)) |
Type II error (\(\beta\)) |
We claim an effect (ES \(\neq\) 0) |
Type I error (\(\alpha\)) |
Correct conclusion (\(1 - \beta\)) |
Universe X | Universe Y |
為研究結果負責的研究者,應該有能力判斷最合適的\(\frac{\alpha}{\beta}\)
發表偏誤(publication bias)極有機會導致\(\alpha\)被低估
可重製研究通常要求\(1 - \beta\) 至少達到 .90。參考Nature Human Behaviour、Collabra: Psychology等期刊的註冊報告投稿指南。
Prior power analysis | Post-hoc power analysis |
---|---|
A,P,E -> S A,P,S -> E P,E,S -> A |
A,E,S - > P |
效果量指標可互相轉換,或運用已知資訊計算。
各種研究場域都有合適的APES估算工具,研究者應以現實需要選擇工具。
演練範例之前,請先下載作業Rmd以及確認安裝R套件effectisize
及pwr
。
功用1: 預估得到有統計意義結果需要的樣本數
Kirk (1996): 某種延緩阿滋海默症患者智力退化的療程測試,找來6名患者接受測試,另外6名接受對照療程。經過一段時間,接受測試療程的患者智力測驗平均分數比對照療程高13分,統計檢定t = 1.61, p = .14。要得到考驗力達.80的.05顯著結果,需要多少受測者?
pwr::pwr.t.test
說明文件。## This example is from Kirk(1996): A researcher tested the medication that might raise the IQ of people suffering from Alzheimer's disease.
## two tailed test
pwr::pwr.t.test(d=unlist(effectsize::t_to_d(1.61, 10))["d"],
power = .80,
type = "two.sample",
alternative = "two.sided")
##
## Two-sample t test power calculation
##
## n = 16.15898
## d = 1.018253
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
## This example is from Kirk(1996): A researcher tested the medication that might raise the IQ of people suffering from Alzheimer's disease.
## one tailed test
pwr::pwr.t.test(d=unlist(effectsize::t_to_d(1.61, 10))["d"],
power = .80,
type = "two.sample",
alternative = "greater")
##
## Two-sample t test power calculation
##
## n = 12.66051
## d = 1.018253
## sig.level = 0.05
## power = 0.8
## alternative = greater
##
## NOTE: n is number in *each* group
功用2: 以現有資料估計現有資訊可達到的考驗力
pwr::pwr.t.test(n=6,
d=unlist(effectsize::t_to_d(1.61, 10))["d"],
type = "two.sample",
alternative = "two.sided")
##
## Two-sample t test power calculation
##
## n = 6
## d = 1.018253
## sig.level = 0.05
## power = 0.3578953
## alternative = two.sided
##
## NOTE: n is number in *each* group
功用3: 設定合理的顯著水準
pwr::pwr.t.test(n=6,
d=unlist(effectsize::t_to_d(1.61, 10))["d"],
type = "two.sample",
sig.level = NULL,
power = .80,
alternative = "two.sided")
##
## Two-sample t test power calculation
##
## n = 6
## d = 1.018253
## sig.level = 0.3679726
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
pwr::pwr.t.test(n=6,
d=unlist(effectsize::t_to_d(1.61, 10))["d"],
type = "two.sample",
sig.level = NULL,
power = .80,
alternative = "great")
##
## Two-sample t test power calculation
##
## n = 6
## d = 1.018253
## sig.level = 0.1877483
## power = 0.8
## alternative = greater
##
## NOTE: n is number in *each* group
功用4: 設計有高考驗力的再現研究
兩件探討同一組變項的相關研究分別報告不顯著的相關係數.2及.24,樣本數分別為78與63。這兩件研究的考驗力分別達到多少?要設計能達到.80考驗力的研究需要多少樣本數?
事前分析評估的最適樣本數,不代表研究結果必定達到設定的統計顯著。
##
## approximate correlation power calculation (arctangh transformation)
##
## n = 78
## r = 0.2
## sig.level = 0.05
## power = 0.4228927
## alternative = two.sided
##
## approximate correlation power calculation (arctangh transformation)
##
## n = 63
## r = 0.24
## sig.level = 0.05
## power = 0.4796724
## alternative = two.sided
設定成功的再現研究應發現r = .22。
size | power |
---|---|
70 | 0.45 |
80 | 0.51 |
90 | 0.55 |
100 | 0.60 |
110 | 0.64 |
120 | 0.68 |
130 | 0.72 |
140 | 0.75 |
150 | 0.78 |
160 | 0.80 |
研究設計有文獻資料參考,可運用整合分析(meta analysis)估計可能的效果量,評估所需樣本數
研究設計無文獻資料參考,可執行敏感度分析(sensitivity analysis)設定合理的樣本數。
設定\(\alpha = .05\)的獨立樣本雙尾檢定比較,達到指定考驗力80%所需最少樣本數。
ES | Interpretation | N |
---|---|---|
0.1 | very small | 3142 |
0.2 | small | 786 |
0.3 | small | 350 |
0.4 | small | 198 |
0.5 | medium | 128 |
0.6 | medium | 90 |
0.7 | medium | 66 |
0.8 | large | 52 |
0.9 | large | 40 |
1.0 | large | 34 |
\[N = N_1 + N_2\]
以現用資源能收集到的樣本數,評估指定考驗力能偵測的效果量
又稱最小有意義效果量(smallest effect size of interest, SESOI)
延伸閱讀: Anvari and Lakens (2021), Lakens (2022) #HW2
簡單迴歸範例
##
## approximate correlation power calculation (arctangh transformation)
##
## n = 100
## r = 0.275866
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
多元迴歸範例
pwr::pwr.f2.test(u = 5, ## potential number of variables
v = (200-5-1), ## adjusted sample size
sig.level = .05^5,
power=.8)
##
## Multiple regression power calculation
##
## u = 5
## v = 194
## f2 = 0.2501398
## sig.level = 3.125e-07
## power = 0.8
種子教師工作坊請下載請下載增能指引,延續學習成果。