May 6, 2024

*Some thoughts on on effect sizes in psychology and education.*

**

The first question one should ask oneself when encountering a proposed new intervention, treatment, or therapy is: does it work? Are there controlled studies with significant findings? The second question is: how large is the effect, the so-called effect size?

Effect size is a way to get a standardized measure of the impact of an intervention. There are many slightly different ways to calculate an effect size, but the most commonly used measure is Cohen’s delta, which calculates the impact of an intervention relative to the control group in terms of standard deviations (SD). For example, if the pre-post improvement in the intervention group was 1.0 SD and 0.7 SD in the control group, the intervention is said to have an effect size of 0.3.

In contrast to statistical significance, the effect size does not take the number of participants into account. An effect of 0.3 is probably insignificant (i.e., p>0.05) with 20 participants in each group but significant when you get to around 200 in each group.

So what is a good-enough effect? When is an intervention worth the time and effort? Cohen himself proposed that 0.2 is a small effect, 0.5 is a medium effect,, and 0.8 is a large effect, and this has been echoed countless times. What did he base this recommendation on? Nothing! He has admitted that he simply took it out of thin air in order to give a rule of thumb.

(John Hattie popularized the use of effect sizes to judge if educational methods were worth their effort. However, Hattie mixed the “test-retest effect” with “effect size.” In the example above, according to Hattie's definition, the effect would be 1.0, while Cohen’s delta was 0.3. Hattie also mixed studies with proximal outcomes, i.e., teaching to the test, often learning specific facts, with generalized measures of abilities, often using standardized tests.)

*The critique and the problems*

The arbitrariness of the traditional criteria from Cohen matters because effect sizes are sometimes used in order to make recommendations. For example, the What Works Clearinghouse recommends educational interventions only if they have shown an effect of >0.25. Effect sizes are also used as rhetorical bats in eternal academic struggles. It is often an effective argument if you can frame your opponents’ pet variable in terms of “only being a small effect”.

In recent publications, a more nuanced discussion about effect sizes is emerging. One highly readable article is by David Funder and Daniel Ozer^{1}*. *They point out some obvious, but often overlooked, aspects such as whether an effect is compounded. Small effect sizes matter if they can be repeated. Let’s say a new method for studying allows a child to learn 0.1 SD more after a week of homework. A one-week, randomized study would dismiss such a finding as too small to be relevant. Yet, the effects on learning over time will be enormous because the effect compounds over months and years.

Another way a small effect can compound is by having an impact on several aspects of life. For example, an improved focus during studying would impact many subject areas.

A third way that small effects matter is if they affect the population at large. For example, if digital media increases inattention by an effect size of 0.15, it does not increase the risk of ADHD very much for the single individual, but it would increase the number of individuals with diagnosis in the population (top 5% percentile of symptoms) with 35%.

Another important question is: small effect compared to what? For interventions in the school setting, the sad truth is that effects on general ability - of for example reading or math - are often small. One meta-analysis of 141 RCTs found the average effect size to be 0.06 ^{2}. In their review of the field of psychology, David Funder and Daniel Ozer came to the following conclusion that also an effect of d≈0.10 is of consequence if it is repeated.

The question of effect sizes and significance has come up in discussion related to both growth mindset interventions and working memory training. In a large study, the effect of growth mindset interventions was found to be 0.15 in low-performing students^{3}. This has been dismissed as being “only small” by critics. However, this is an example of an effect that is likely to compound by affecting many subject areas continuously over time.

For working memory training, Berger and colleagues found that the effect of working memory training in 279 subjects, compared to 293 subjects that took regular classes (most often mathematics), was that the working memory training group improved in mathematics with an effect size of roughly 0.2. This was barely significant in this large group, which also explains why smaller studies often found non-significant effects on mathematics. However, when Berger followed up 1 year later, the effect of an improved attention and working memory capacity had compounded over time, by impacting learning over the long run, and the benefit was now 0.4.

I think that the field of psychology and educational interventions has been naïve in terms of effect sizes. Researchers (including me) have expected large effects and significance with too small sample sizes. The field is now maturing and we are seeing more large-scale studies and a more nuanced discussion about effect sizes.

**References**

1. Funder DC, Ozer DJ. Evaluating Effect Size in Psychological Research: Sense and Nonsense. *Advances in Methods and Practices in Psychological Science. *2019;2:156-168.

2. Lortie-Forgues H, Inglis M. Rigorous Large-Scale Educational RCTs Are Often Uninformative: Should We Be Concerned? *Educational Researcher. *2019;48(3):158-166.

3. Yeager DS, Hanselman P, Walton GM, et al. A national experiment reveals where a growth mindset improves achievement. *Nature. *2019;573(7774):364-369.

4. Berger EMF, E.; Hermes, H.; Schunk, D.; Winkel, K.;. The impact of working memory training on children’s cognitive and non-cognitive skills. *Journal of Political Economy. *in press.

Professor of Cognitive Neuroscience

200,000+

20+

120+