Commentary: “Is Working Memory Training Effective? A Meta-Analytic Review”

In the Developmental Psychology article “Is working memory training effective? A meta-analytic review” (May 2012) authors Monica Melby-Lervåg and Charles Hulme discussed the results of a meta-analysis of working memory (WM) training studies, including children and adults with various clinical presentations.

The broad study inclusion criteria required use of a working memory intervention, a pre-test/post-test design, a training group and a control group (untreated or treated) regardless whether randomly assigned or not, and standardized tests of nonverbal ability, verbal ability, attention, decoding or arithmetic. Based on these criteria, 30 independent group comparisons from 23 published studies were analyzed. Due to the variability across studies, the authors included age, training dose, design type, control type, learner status, and intervention type as moderator variables. The impact of WM training was assessed by comparing pre-test-post-test effect sizes between working memory training groups and control groups.

In reviewing this paper, certain valid points were made by the authors:

  • WM is important and domain general
  • In referring to WM as “one of the most influential theoretical constructs in cognitive psychology”, the authors recognize in their literature review the importance of WM as a domain general cognitive function: “Evidence from large-scale latent variable studies with both children (Alloway et et al., 2006) and adults (Kane et al., 2004) supports the conclusion that working memory capacity is best thought of as predominantly a domain-general capacity (though specific working memory tasks may show small degrees of modality specificity in their storage demands)” (pp. 1-2).

  • WM is impaired in diverse clinical populations
  • The authors recognize the research citing WM deficits as a “potential explanation for a variety of developmental cognitive disorders” including reading and math disabilities, ADHD, and specific language impairment. They further posit that WM deficits seem to be inadequate as the sole explanation for such a wide variety of disorders. Unfortunately, they contradict this well made point by then ignoring the important differences between groups, combining samples with developmental disorders with typically developing groups in their analysis.

  • Randomized, placebo controlled studies are needed
  • In addressing the methodological issues for evaluating WM training research, the authors point out the obvious and previously discussed (Shipstead et al., 2010) necessity for use of randomized, placebo controlled study designs. Such study designs are vital if researchers are to elucidate whether WM training effects can be attributed to the training program and to rule out the impact of variables such as expectancy or maturation. However, despite their call for studies with sound methodology, the authors included studies in their own analysis that utilized placebo and non-placebo controls as well as both randomized and non-randomized samples and failed to give adequate weight to those with more rigorous designs.

    In examining the methods and interpretations put forth in this meta-analysis, one should consider some notable limitations:

  • Combination of a diverse group of sample populations, ignoring the distinction between individuals with and without WM deficits.
  • The sample populations evaluated in this meta-analysis were extremely heterogeneous, including studies with participants ranging from under 10 years to over 51 years old. Effect sizes from diverse clinical groups were also combined including typically developing children, adolescents, young adults and aging adults, children with ADHD, children with dyslexia, children with low WM, and children with low IQ (55 – 85). The most glaring weakness was that the authors lumped together both groups with severe WM impairments and groups with typical WM abilities. A more useful analysis would have considered the particular subgroups and how they may or may not have been differentially impacted by training. Based on the Cogmed research and clinical experience, the impact of Cogmed Working Memory Training depends on the distinction between individuals with and without WM deficits. Improving WM through training results in larger effects, both on tests and in real world changes, for individuals constrained by their WM capacity.

  • Comparison of heterogeneous research methods
  • Various training programs and protocols were included in the meta-analysis as “working memory training”, such as updating training (Dahlin et al., 2008), simple and complex span training (Klingberg et al., 2002; 2005), and strategy training while listening to stories (St. Clair-Thompson et al., 2011). Not only do the basic approaches of these training programs differ but also, so do their intensity and the adaptive algorithms applied. The exact design and timing of the stimuli in each program also vary considerably.

    In an attempt to account for the diversity in the sampled programs by including moderator variables such as “training dose”, the authors established an arbitrary definition for this variable by parsing studies with 8 or less hours of training from studies with more than 8 hours of training without a basis for these criteria. However, although the authors are free to define search criteria as they please, this does not make the outcomes meaningful. One could even argue that heterogeneity could theoretically be worked out in statistical analysis, however, the number of observations were too few with most outcome analysis including 3 effect sizes (Tables 3 & 4). Regardless of whether there were significant or non-significant outcomes, the inclusion of a widely heterogeneous sample of training methods renders this analysis misleading.

  • The cognitive measures used to assess change were combined into arbitrary categories
  • In each of the original studies included in this meta-analysis, a variety of cognitive measures were used to assess nonverbal ability, verbal ability, attention, reading, arithmetic, verbal WM, and visuo-spatial WM. This inconsistency between the type of assessment used is a notable limitation to interpretation of the extant WM training literature. As a means of dealing with this limitation, the authors decided to broadly define each cognitive measure so to be able to include the various measures used in the original studies. Unfortunately, arbitrarily defining a cognitive measure does not mean that the definition is useful or analysis of that measure can be readily interpreted. For example, measures of “attention” were defined as “measures that aimed to tap the participant’s ability to concentrate selectively on one aspect of a task while ignoring others”. The Stroop task, a task generally regarded as a measure of inhibitory ability was noted as one of the measures used in the original articles and included as an “attention” measure.

    A non-significant effect of this analysis was taken as evidence of lack of effect of “working memory training” on “attention”. However, lack of effect on the Stroop task cannot be interpreted as lack of effect on “attention” and results from the Stroop task should not be put together with other measures that do not include an inhibitory component. The results of the impact of training on “attention” as defined in this analysis are thus not possible to interpret.

  • Failure to include relevant measure of reading
  • The authors used decoding as the measure of reading and included only measures from the original studies that assessed accuracy or fluency of word or non-word reading. However, one could argue that more relevant to WM capacity is one’s reading comprehension ability: the process of reading sentences, holding them in mind, and integrating the information to uncover the meaning. This process relies to a much greater extent on one’s ability to simultaneously process and store information over the short term. The use of decoding as a measure of transfer from WM training is thus misleading.

  • Failure to include analysis of behavior rating data & failure to differentiate clearly between developmental disorders
  • The authors interestingly did not include an evaluation of effect sizes for behavioral rating data that are in fact available in some of the reviewed articles (e.g., Klingberg et al., 2005). They establish, based on their analysis, that the working memory training programs “yield reliable short-term improvements on both verbal and nonverbal working memory tasks”. However, they state that the “pattern of near-transfer effects in the absence of more general effects on cognitive performance (such as attention or nonverbal ability) or measures of scholastic attainment (reading or arithmetic ability) suggests that working memory training procedures cannot, based on the evidence to date, be recommended as suitable treatments for developmental disorders (such as ADHD or dyslexia)”. Most notable is the argument that WM training is not effective for ADHD based on findings that WM does not transfer to measures that are not a basis for ADHD diagnosis. In order to establish whether training transferred to individuals with ADHD it is vital that analysis of behavioral symptoms be considered, as these are the basis for diagnosis rather than transfer to nonverbal ability. Another notable problem with the authors’ conclusion is the sweeping statement that WM training is not suitable for developmental disorders that include both ADHD and dyslexia. As two conditions with different etiologies, the statement that WM training is inappropriate for both ADHD and dyslexia solidifies even further that the authors do not accept WM training as a specific intervention specifically for WM deficits.

  • The inclusion and exclusion criteria for studies in this meta-analysis were applied inconsistently
  • It is unclear why certain studies were included in this meta-analysis and other highly cited studies with significant effects were omitted. The arbitrary nature of inclusion is reflected in a rather random selection of unpublished studies (e.g., Shavelson et al., 2008).

    The authors also made some sweeping statements that misrepresent the WM training programs included and particularly misrepresent Cogmed:

  • General conclusions of analysis should not be applied to Cogmed
  • It is important to recognize that this meta-analysis is not a direct critique of Cogmed Working Memory Training. Of the 30 group comparisons included in this paper, 8 were listed as “CogMed” and only 4 of the 8 studies were those where Cogmed was applied in a manner consistent with the clinical application that is, including school children and adults with WM deficits. Two comparisons by Thorell et al., (2008) and one by Bergman-Nutley et al. (2011) were studies of typically developing preschool children and Shavelson et al. (2008) included typically developing middle school students. Thus, only 4 of 30 group comparisons used in this study were relevant for the clinical and commercial use of Cogmed and solid conclusions about Cogmed from this analysis are highly unlikely.

  • Failure to recognize the use of non-trained tasks for assessment of transfer after training
  • The authors incorrectly held that, “…the most negative interpretation would be that the changes documented on near-transfer measures reflect very low-level changes in things like familiarity with specific tasks or even familiarity with being tested on a computer”. Despite this interpretation, the authors did not disclose that many WM training studies, including Cogmed studies, in this review used assessments to measure transfer that differed in presentation and response mode (non-trained tasks) so to control for the possibility that improvements in WM were task specific. Thus, at least for the Cogmed studies, this argument reflects the failure of the authors to differentiate between training programs and to clearly report study methods.

  • False report that WM training programs (including Cogmed) do not rest on task analysis or rationale
  • The authors include in this review that “…current working memory training programs do not appear to be based on any clear theory of the processes involved or any clear task analysis”. This argument is at least, not true for programs including simple and complex span and dual-n back tasks. For example, tasks included in Cogmed Working Memory Training have their basis in widely used and well researched WM tasks such as the span board and digit span.

  • False report of WM training program (particularly Cogmed) “claims”
  • It is vital that the claims that support each WM training intervention are made carefully and that they be scrutinized. Unfortunately, some commercially available programs are marketed aggressively and the wide ranging and explicit claims of “Jungle Memory” are cited by the authors for good reason. However, the authors report incorrectly that Cogmed makes the claim that improving WM improves IQ. What is troubling is that the authors are not actually quoting a Cogmed source but rather, an outside media article that was quoted verbatim on the Cogmed website and listed obscurely. The selection of this quote calls into question the intent of the authors, as there are many claims truly made by Cogmed and that are readily available on the website.

  • Concluding remarks and future directions
  • This review brings the field of evidence based cognitive training and in particular, WM training to the forefront. It is encouraging that the field now merits this degree of scrutiny and that the growth in research and clinical implementation has moved WM training beyond just a promising possibility. WM training deserves thoughtful review and it is natural that the methods be challenged and scrutinized by skeptical researchers.

    Unfortunately, a discerning review of the current literature is a difficult undertaking and can lead to questionable analysis and flawed conclusions as found in the Melby-Lervåg and Hulme article. The authors used this review to tell us what we already know to be true: “Current training programs yield reliable, short-term improvement on both verbal and non-verbal working memory tasks”. However, they failed to recognize the key differences between training programs and the serious limitations inherent in comparing these programs.

    Importantly, the key learning from this meta-analysis is that researchers must build on the first generation of cognitive training literature by addressing the current limitations and by asking the right questions. WM training works, but what regimens will lead to generalization and long-term gains? What are the underlying neural mechanisms of training gains? What are the best training conditions and for whom is training most useful? It is in pursuing answers for these provocative questions that the field will advance and that greater evidence will emerge that will allow for sounder meta-analysis and conclusions.