我的研究需要收集多少資料才能得到有意義的結果?
我能運用的研究資源足夠收集可支持研究假設的資料嗎?
| Null | Alternative |
|---|---|
| 虛無假設 | 對立假設 |
| No effect | A detectable effect |
| 假設存在的效果在測量尺度為0 | 假設存在的效果在測量尺度非0 |
| 實際的效果為0,結論宣稱非0 | 實際的效果非0,結論宣稱為0 |
我有興趣的是那個假設?
我了解想要測試的效果,有事前評估偵測到效果的條件嗎?

考驗力接近封頂(100%),任何統計檢定都能顯著。未設想虛無假設的背景,統計顯著即無意義。
以p value有沒有小於顯著水準,決定結果有沒有意義。忽略p value的連續性。
效果量與樣本數都會影響p value。過度簡化必定低估或高估p value的意義。
統計顯著不代表預期的效果真實存在。不提醒學習者很容易造成誤用統計檢定。
Type I error
實際的效果為0,根據分析結果宣稱非0
\(\alpha = p(\frac{d \neq 0}{\theta = 0})\)
Type II error
實際的效果非0,根據分析結果宣稱為0
\(\beta = p(\frac{d = 0}{\theta \neq 0})\)
證實必定有效的結果/否證實際不存在的效果所耗費的沈默成本?
效果量偏低,獲得統計顯著結果的益處是什麼?
現實資源能否支持\(\frac{\alpha}{\beta}\)平衡損益?
Cohen’s suggested balance ~ \(\frac{\alpha}{\beta} = \frac{1}{4}\)
\(\alpha < .05\), \(1 - \beta\)至少.80
“The notion that failure to find is less serious than finding something that is not there accords with the conventional scientific view” (Cohen, 1988)
Motivation for p-hacking
| There is no effect (null=true) |
There is an effect (null=false) |
|
|---|---|---|
| We claim no effect (ES=0) |
Correct conclusion (\(1 - \alpha\)) |
Type II error (\(\beta\)) |
| We claim an effect (ES \(\neq\) 0) |
Type I error (\(\alpha\)) |
Correct conclusion (\(1 - \beta\)) |
| Universe X | Universe Y |
為研究結果負責的研究者,應該有能力判斷最合適的\(\frac{\alpha}{\beta}\)
發表偏誤(publication bias)極有機會導致\(\alpha\)被低估
可重製研究通常要求\(1 - \beta\) 至少達到 .90。參考Nature Human Behaviour、Collabra: Psychology等期刊的註冊報告投稿指南。
| Prior power analysis | Post-hoc power analysis |
|---|---|
| A,P,E -> S A,P,S -> E P,E,S -> A |
A,E,S - > P |
效果量指標可互相轉換,或運用已知資訊計算。
各種研究場域都有合適的APES估算工具,研究者應以現實需要選擇工具。
演練範例之前,請先下載作業Rmd以及確認安裝R套件effectisize及pwr。
功用1: 預估得到有統計意義結果需要的樣本數
Kirk (1996): 某種延緩阿滋海默症患者智力退化的療程測試,找來6名患者接受測試,另外6名接受對照療程。經過一段時間,接受測試療程的患者智力測驗平均分數比對照療程高13分,統計檢定t = 1.61, p = .14。要得到考驗力達.80的.05顯著結果,需要多少受測者?
pwr::pwr.t.test說明文件。## This example is from Kirk(1996): A researcher tested the medication that might raise the IQ of people suffering from Alzheimer's disease.
## two tailed test
pwr::pwr.t.test(d=unlist(effectsize::t_to_d(1.61, 10))["d"],
power = .80,
type = "two.sample",
alternative = "two.sided")##
## Two-sample t test power calculation
##
## n = 16.15898
## d = 1.018253
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
## This example is from Kirk(1996): A researcher tested the medication that might raise the IQ of people suffering from Alzheimer's disease.
## one tailed test
pwr::pwr.t.test(d=unlist(effectsize::t_to_d(1.61, 10))["d"],
power = .80,
type = "two.sample",
alternative = "greater")##
## Two-sample t test power calculation
##
## n = 12.66051
## d = 1.018253
## sig.level = 0.05
## power = 0.8
## alternative = greater
##
## NOTE: n is number in *each* group
功用2: 以現有資料估計現有資訊可達到的考驗力
pwr::pwr.t.test(n=6,
d=unlist(effectsize::t_to_d(1.61, 10))["d"],
type = "two.sample",
alternative = "two.sided")##
## Two-sample t test power calculation
##
## n = 6
## d = 1.018253
## sig.level = 0.05
## power = 0.3578953
## alternative = two.sided
##
## NOTE: n is number in *each* group
功用3: 設定合理的顯著水準
pwr::pwr.t.test(n=6,
d=unlist(effectsize::t_to_d(1.61, 10))["d"],
type = "two.sample",
sig.level = NULL,
power = .80,
alternative = "two.sided")##
## Two-sample t test power calculation
##
## n = 6
## d = 1.018253
## sig.level = 0.3679726
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
pwr::pwr.t.test(n=6,
d=unlist(effectsize::t_to_d(1.61, 10))["d"],
type = "two.sample",
sig.level = NULL,
power = .80,
alternative = "great")##
## Two-sample t test power calculation
##
## n = 6
## d = 1.018253
## sig.level = 0.1877483
## power = 0.8
## alternative = greater
##
## NOTE: n is number in *each* group
功用4: 設計有高考驗力的再現研究
兩件探討同一組變項的相關研究分別報告不顯著的相關係數.2及.24,樣本數分別為78與63。這兩件研究的考驗力分別達到多少?要設計能達到.80考驗力的研究需要多少樣本數?
事前分析評估的最適樣本數,不代表研究結果必定達到設定的統計顯著。
##
## approximate correlation power calculation (arctangh transformation)
##
## n = 78
## r = 0.2
## sig.level = 0.05
## power = 0.4228927
## alternative = two.sided
##
## approximate correlation power calculation (arctangh transformation)
##
## n = 63
## r = 0.24
## sig.level = 0.05
## power = 0.4796724
## alternative = two.sided
設定成功的再現研究應發現r = .22。
| size | power |
|---|---|
| 70 | 0.45 |
| 80 | 0.51 |
| 90 | 0.55 |
| 100 | 0.60 |
| 110 | 0.64 |
| 120 | 0.68 |
| 130 | 0.72 |
| 140 | 0.75 |
| 150 | 0.78 |
| 160 | 0.80 |
研究設計有文獻資料參考,可運用整合分析(meta analysis)估計可能的效果量,評估所需樣本數
研究設計無文獻資料參考,可執行敏感度分析(sensitivity analysis)設定合理的樣本數。
設定\(\alpha = .05\)的獨立樣本雙尾檢定比較,達到指定考驗力80%所需最少樣本數。
| ES | Interpretation | N |
|---|---|---|
| 0.1 | very small | 3142 |
| 0.2 | small | 786 |
| 0.3 | small | 350 |
| 0.4 | small | 198 |
| 0.5 | medium | 128 |
| 0.6 | medium | 90 |
| 0.7 | medium | 66 |
| 0.8 | large | 52 |
| 0.9 | large | 40 |
| 1.0 | large | 34 |
\[N = N_1 + N_2\]
以現用資源能收集到的樣本數,評估指定考驗力能偵測的效果量
又稱最小有意義效果量(smallest effect size of interest, SESOI)
延伸閱讀: Anvari and Lakens (2021), Lakens (2022) #HW2
簡單迴歸範例
##
## approximate correlation power calculation (arctangh transformation)
##
## n = 100
## r = 0.275866
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
多元迴歸範例
pwr::pwr.f2.test(u = 5, ## potential number of variables
v = (200-5-1), ## adjusted sample size
sig.level = .05^5,
power=.8)##
## Multiple regression power calculation
##
## u = 5
## v = 194
## f2 = 0.2501398
## sig.level = 3.125e-07
## power = 0.8
種子教師工作坊請下載請下載增能指引,延續學習成果。