Table of Contents

1 The impact of UTOS on the conceptual framework for note-taking

In Note 3 in the manuscript, we noted the fact that our UTOS moderators specifically apply to paths a and c, but not b (see Figure 1). In our study, we have ten moderators (i.e., orthographic script distance, region, note-taking option, notes-taking type, material type, measure, input type, learners’ proficiency, learning target, and time). Some moderators might directly affect learners’ note-taking behavior when learners are exposed to the L2 input. For example, learners’ L1-L2 orthographic distance may affect the ease with which learners can understand the input (Zhang & Zhang, 2020), and in turn affect their ability to take notes. Similarly, the different regions where the study was conducted might also influence learners’ note-taking perceptions and habits (Siegel & Kusumoto, 2022). Note-taking options (i.e., whether learners are required or allowed to take notes), note-taking instruction (i.e., whether or not learners are provided with any note-taking instruction), and note-taking types can affect the effectiveness of note taking to a certain degree given their ability to engage or re-direct students’ attention to various aspects of input (Siegel, 2021).

Some other moderators might also affect note taking. For instance, a learner with a higher proficiency level might be more easily able to identify information when encountering input and might be more motivated to take notes, thereby enhancing the efficiency and effectiveness of the note-taking process. Also, the type of input in which information is presented to learners, whether it is in written or aural input, might influence how learners take notes. The nature of the material itself might also affect learners’ note-taking behavior. Academic input, which might be more complex and in-depth compared to non-academic input, might pose a challenge for note taking to take place (Jin & Webb, 2023). The effect of note taking may also vary depending on the measure types. Measuring learning outcomes via recognition tests (e.g., multiple-choice items) or recall tests (e.g., writing the meaning of a given word or the L2 word that corresponds to a given meaning) may require different depths of processing, which in turn can influence (i.e., moderate) the effect of note taking. Another moderator, learning outcome, might also affect note taking. These learning outcomes might guide learners on what to focus on when receiving input. For instance, when the learning outcome is reading comprehension, the notes might be broader in scope (e.g., targeting the content). However, when the learning outcome is vocabulary learning, the notes might be narrower in scope (e.g., targeting the keywords). The moderator time (i.e., outcome measurement timing) was added to this meta-analysis to differentiate between learners’ pre- versus post-treatment learning outcomes and thereby to measure the possible gains (i.e., difference between pre- and post-tests) from note-taking as a learning aid.

As can be seen, all of our substantive UTOS variables can potentially moderate the act of note taking (path a) and/or the processing of input (path c), and thus, do not, by definition, apply to path b directly. Finally as noted in the manuscript, our M moderators which by themselves “do not necessarily merit an interpretation [were all] adjusted for in the background” (Norouzian & Bui, 2024, p. 16), so the impact of the substantive UTOS variables can be more clearly examined.

Figure 1. Theoretical framework for note taking

Figure 1. Theoretical framework for note taking

2 Description of the reasons for excluding moderators from Jin & Webb (2023)

As noted in the manuscript, we included 10 substantive (UTOS) and 3 additional methodological (M) moderators in our study. The following table provides a detailed description of the considerations involved in excluding certain substantive moderators from Jin & Webb (2023). Please see the methodology section in the manuscript for the full description of our moderators.

3 Raw data and initial analyses

The execution of these initial analyses may be time consuming. Unless otherwise needed, we suggest that readers instead run the analyses in the next section which in reality use the results of these initial analyses. Additionally, to better understand the variables involved in the initial analyses (e.g., those used for estimating Hedges’ g effect sizes), the next Table provides a list of their names and definitions. (Click on Code on the bottom right for the reproducible codes used in each section).

# We use Software introduced by Norouzian & Bui (2024)
source("https://t.ly/olaQ0")

# We also use the following R package for choosing the best candidate model
library(bbmle)

# Raw coding sheet with merged first row and lots of empty cells
dat <- read.csv("https://t.ly/i5aYY", na=c(NA,"","NA","NULL"))

# Remove the merged first row but use the first row to rename the column names
dat2 <- setNames(dat[-1,], dat[1,])

# Remove any accidental spaces or empty rows or columns
dat3 <- full_clean(dat2)

# Make sure each column's data type is correctly recorded
dat4 <- type.convert(dat3, as.is=TRUE)

# Compute effect sizes
dat5 <- escalc("SMD", m1i = mT, m2i = mC, sd1i = sdT, sd2i = sdC, 
               n1i = nT, n2i = nC, data = dat4, var.names = c("g", "v_g"))


# Adjust for assignment by intact classes
dat6 <- group_by(dat5, study) %>% 
  
  mutate(
    
    g2 = ifelse(assign_type=="class", g_cluster(g, n_class, Nt, Nc), g),
    
    v_g2 = ifelse(assign_type=="class", g_vi_cluster(g, n_class, Nt, Nc), v_g),
    
    SE_egger =  sqrt((nT + nC) / (nT * nC)),
    
    time = recode(time, "pretest" = "baseline"),
    
  region = recode(region, "Asia" = "East Asia") # reviewer requested changing Asia to East Asia

    
  ) %>% ungroup() %>% 
  
  mutate(effect = row_number())


# How many effects and studies
dat6 %>%
  group_by(study) %>%
  summarise(n_gi = n()) %>%
  summarise(
    `No. of Studies` = n(), 
    `No. of Effects` = sum(n_gi)
  ) %>% ungroup()


# What is the distribution of effects
ggplot(dat6) + aes(g2) + geom_density()

# Quite skewed to the right, looks like we have some large effects
# even though we have only 57 effects from 27 studies

# What are the two largest effects?
two_largest <- tail(sort(dat6$g2),2)
# [1]  6.89757 10.45655 insanely large, many times larger than 
# than mean(dat6$g2) which is 0.9!


# These two large effects also exceed 3*SD from the mean (Lipsey & Wilson, 2001)
two_largest > with(dat6, c(`3SDfromMean`= mean(g2)+3*sd(g2)))
# [1] TRUE  TRUE

# Let's inspect the impact of these two extreme effects on a 
# basic 3-level model

# Reintroduce naturally occurring dependence before removing 2 largest effects
Vs <- with(dat6, impute_covariance_matrix(v_g2, study, r=.5,
                                          subgroup = sample_id))


# 3-level Additive symmetry model
m1 = rma.mv(g2 ~ time + study_length+no_treat+true_experiment, Vs, 
            random = ~1|study/effect, data = dat6,
            dfs = "contain")


# Removing 2 largest effect sizes to measure their impact on m1
dat7 <- filter(dat6, !g2 %in% two_largest)


# Reintroduce naturally occurring dependence AFTER removing 2 largest effects
Vs_af <- with(dat7, impute_covariance_matrix(v_g2, study, r=.5,
                                          subgroup = sample_id))


# m1 model before removing 2 largest effects
m_before <- m1

# m1 model AFTER removing 2 largest effects
m_after <- update(m_before, data=dat7, V=Vs_af)

# Measuring the CIs width of pre- post effects for models BEFORE (_bf) & AFTER (_af)
(t_bf =type.convert( post_rma(m_before,~ time)$table, as.is=TRUE))
(t_af =type.convert( post_rma(m_after,~ time)$table, as.is=TRUE))

(t_bf_ci_widths = t_bf$Upper - t_bf$Lower)
(t_af_ci_widths = t_af$Upper - t_af$Lower)

# The %reduction in the width of CIs due to removing two outliers
paste0(round((t_bf_ci_widths - t_af_ci_widths)/t_bf_ci_widths*100),"%")
# [1] "52%" "46%" "57%"


# Vast improvement in precision (CIs narrower by up to 57%) due to removing two outliers!

# Continue to model selection without the two outlying effects using dat7
# Let's run 5 more models in addition to m_after and choose:


####################
# Model selection
####################

# 3-level Additive symmetry model
m1 <- m_after

# Estimation checks out, passes!
profile(m1)


# Homogeneous Auto-regressive model
m2 = rma.mv(g2 ~ time + study_length+no_treat+true_experiment, Vs_af, 
            random = list(~time|study, ~1|effect), struct = "AR", 
            data = dat7,
            dfs = "contain")

# Estimation checks out, passes!
profile(m2)

# Heterogeneous auto-regressive model
m3 = rma.mv(g2 ~ time + study_length+no_treat+true_experiment, Vs_af, 
            random = list(~time|study, ~1|effect), struct="HAR", 
            data = dat7,
            dfs = "contain")

# Estimation doesn't check out, exclude this model!
profile(m3)

# Heterogeneous compound symmetry model
m4 = rma.mv(g2 ~ time + study_length+no_treat+true_experiment, Vs_af, 
            random = list(~time|study, ~1|effect), struct = "HCS", 
            data = dat7,
            dfs = "contain")


# Estimation doesn't check out, exclude this model!
profile(m4)

# Homogeneous compound symmety model
m5 = rma.mv(g2 ~ time + study_length+no_treat+true_experiment, Vs_af, 
            random = list(~time|study, ~1|effect), struct = "CS", 
            data = dat7,
            dfs = "contain")

# Estimation doesn't check out, exclude this model!
profile(m5)


# 4-level Additive symmetry model
m6 = rma.mv(g2 ~ time + study_length+no_treat+true_experiment, Vs_af, 
            random = ~1|study/time/effect, data = dat7,
            dfs = "contain")

# Estimation checks out, passes!
profile(m6)


# Run a weighted comparison between the above 'checked out' models
AICctab(m1, m2, m6, weights=TRUE, base=TRUE)

# m1 wins!! We'll use a 3-level additive symmetry model.


# Q: Is this overall longitudinal model sensitive to the amount of naturally occurring dependence?

# Time effects:
p2 <- post_rma(m1, ~time)

# Sensitivity analysis:
sense_rma(p2, var_name = "v_g2")

# A: Not really except in the case of posttest2 effects which are excluded from interpretation due to their extremely limited number (see next part).

# How many studies and effects for each meta-analytic model do we have?
moderators <-  c(
            "time",
            "treat_grp",
            "outcome",
            "measure",
            "input_mode",
            "material_type",
            "note_option",
            "prof",
            "script_distance",
            "region")


# A list of time and moderators interacting with time
LIST <- c("time", map(moderators[-1], c, "time"))


# Count # of studies and effects at each time
setNames(map(LIST, ~effect_count(dat7, study, !!!syms(.), show0=FALSE, arrange_by="time", na.rm=TRUE)), moderators)

# post-test2 effects (m=4) are from 3 studies! Exclude from interpretations.

########################################
# Model fitting after initial steps
########################################

# Fit all moderator models using a function

fit_model <- function(pred="none", 
                      V = Vs_af, data = dat7, 
                      method_vars = c("study_length","no_treat",
                                      "true_experiment")){
  
  overall <- pred=="none"
  
  time_case <- if(pred!="time")"* time" else " "
  
  form <- as.formula(paste("g2 ~", if(overall) "" else paste(paste(pred, time_case),"+"), 
                           paste(setdiff(method_vars,pred), 
                                 collapse = "+")))
  
  m <- rma.mv(form, V = V, 
              random = ~1|study/effect, data = data,
              dfs = "contain")
  
  m0 <- update.rma(m, yi = g2 ~ 1)
  
  form_post_rma <- if(overall) ~1 else as.formula(paste("~",pred, time_case))
  
  ems <- post_rma(m, form_post_rma)
  
  form_plot <- if(overall) ~1 else as.formula(paste(if(pred=="time") "~" else paste(pred,"~"), "time"))
  
  
  legend_t <- if(!overall){
    if(pred=="time")"Time" else 
      if(pred=="measure")"Measure Type" else 
        if(pred=="test_type")"Test Type" else 
          if(pred=="prof") "Proficiency" else
            if(pred=="study_setting") "Study Setting" else
              if(pred=="lang_context") "Language Context" else
                if(pred=="treat_grp") "Note-Taking Type" else
                  if(pred=="region") "Region" else
                    if(pred=="input_mode") "Input Mode" else
                      if(pred=="note_option") "Note-Taking Option" else
                        if(pred=="note_instruct") "Note-Taking Instruction" else
                          if(pred=="script_distance") "L1-L2 Orthographic Distance" else
                            if(pred=="age_group") "Age Group" else
                              if(pred=="material_type") "Material Type" else
                                str_to_title(pred) 
  } else 
  { "Overall Effect" }
  
  
  plot <-  plot_rma(m, form_plot, xlab = if(!overall) "Time" else NULL, ylab="Effect Size (Hedges' g)", dodge=.25) +
    labs(color = legend_t) + theme_test() + 
    scale_color_manual(values = c("black","red", "blue", "green3", "purple",
                                  "orange3", "pink3", "red4"))
  
  
    R2 <- R2_rma(m, null_model = m0, model_names = legend_t)
  
  list(model = m, ems = ems, plot = plot, R2 = R2)
}



# Fit all moderator models:
out <- setNames(map(moderators, fit_model), moderators)

# Save them and share them with readers
saveRDS(out, "np.rds")

4 Definition of the additional variables beyond moderators

5 Display of data and preliminary descriptive analyses

As noted above, this section uses the saved results of the previous section (no need to actually run the R code in the previous section, unless otherwise needed). Readers are encouraged to actually run the following R codes. Once again, click on Code on the bottom right for the reproducible codes used in each section.

# We use the software package introduced by Norouzian & Bui (2024)
source("https://raw.githubusercontent.com/rnorouzian/i/master/3m.r")


library(knitr)
library(flextable)
library(kableExtra)
library(rmarkdown)

opts_chunk$set(message=FALSE, warning=FALSE, fig.align="center")


# data after outlier removal from previous initial analyses
dat7 <- read.csv("https://raw.githubusercontent.com/fpqq/w/main/dat_after_processing.csv")

6 Distribution Summary of Effect Sizes

g <- dat7 %>%
  group_by(study) %>%
  summarise(n_gi = n()) %>%
  summarise(
    `No. of Studies` = n(),
    `No. of Effects` = sum(n_gi),
    `Min. Effects in Study` = min(n_gi),
    `Max. Effects in Study` = max(n_gi),
    `Median Effects in Study` = median(n_gi)
  ) %>% ungroup()


flextable(g) %>%
  autofit() %>% set_caption("Distribution Summary of Effect Sizes") %>% fontsize(size = 11, part = "all") %>%
  line_spacing(space = .6, part = "all")

7 Publication Bias

7.1 Funnel plot at study level

Figure 3 displays the studies’ individual effect size estimates aggregated at the study level. The dotted triangle indicates the boundaries for statistical significance on either side of the null effect (i.e., no study-level effect in reality exists; 0). As can be seen, there are six effect size estimates aggregated at the study level that are statistically significant in magnitude and positive in direction. This value constitutes ~22% of the total number of study-level aggregate effect sizes in our meta-analysis. Furthermore, half of these study-level aggregate effect sizes are from the “Less Visible Literature” (Hopewell, Clarke, & Mallett, 2005) including the largest of them. Arguably, such evidence does not seem to indicate a tendency for the note-taking literature to intentionally favor studies that, as a whole, have found positive and statistically significant effects from note-taking. Thus, this form of publication bias at the study-level seems less likely.

###############################
# 3M publication bias detection
###############################

# Naturally existing dependence from previous section
Vs = with(dat7, impute_covariance_matrix(v_g2, study, r=.5,
                                        subgroup = sample_id))

# Magnitude of within-study correlations
rho <- 0.5

# Aggregate effects at Study level (level 3)
data_agg_study <- 
  dat7 %>% 
  escalc(data = ., yi = g2, vi = v_g2) %>% 
  aggregate.escalc(cluster = study, rho = rho, weighted = FALSE)

# Contour plot at study level
with(data_agg_study,
     contour_funnel(x = g2, 
                    vi = v_g2, sig = FALSE,
                    xlab = "Study-Level Effect Sizes",
                    col = ifelse(gray=="yes","red","blue"),
                    bg = ifelse(gray=="yes","red","blue")))


legend("topright", c("Less Visible","Mainstream"), title = "Literature", pch = 19,
       col = c("red","blue"), title.font = 2, cex = .8)
box()
Figure 3. Contour-Enhanced Funnel Plot of Study-Level Effects

Figure 3. Contour-Enhanced Funnel Plot of Study-Level Effects

# This time get the tabular counts of study-level effects that are sig.
# g <- with(data_agg_study,
#      contour_funnel(x = g2, 
#                     vi = v_g2, sig = TRUE))

flextable(g) %>%
  autofit() %>% set_caption("Statistically significant study level effects") %>% fontsize(size = 11, part = "all") %>%
  line_spacing(space = .6, part = "all")

7.2 Funnel plot at effect size level

Figure 4 displays the studies’ individual effect size estimates. As before, the dotted triangle indicates the boundaries for statistical significance on either side of the null effect (i.e., no individual effect in reality exists; 0). As can be seen, there are seventeen effect size estimates that are statistically significant in magnitude and positive in direction. This value constitutes ~31% percent of the total number of effect sizes in our meta-analysis. On the other hand, there are two effect size estimates that are statistically significant in magnitude and negative in direction. This value constitutes ~3% percent of the total number of effect sizes in our meta-analysis. Furthermore, ~30% of these effect estimates are from the “Less Visible Literature” including the largest of them.

Arguably, the comparison at the effect size level could potentially suggest the possibility of an imbalance in the note-taking literature in favor of the positive and statistically significant effects. However, given the lack of such an imbalance at the study-level and presence of multiple positive and statistically significant effects in the less visible literature, the trend seen at the effect size level might indicate a somewhat natural process that is not, for the most part, impacted by the publication industry’s policies as to which studies should be published and which ones should not in the note-taking literature.

# Contour plot at effect size level
with(dat7,
      contour_funnel(x = g2, 
                     vi = v_g2, sig = FALSE,
                     col = ifelse(gray=="yes","red","blue"),
                     bg = ifelse(gray=="yes","red","blue")))

legend("topright", c("Less Visible","Mainstream"), title = "Literature", pch = 19,
       col = c("red","blue"), title.font = 2, cex = .8)
box()
Figure 4. Contour-Enhanced Funnel Plot of Individual Effects

Figure 4. Contour-Enhanced Funnel Plot of Individual Effects

# This time get the tabular counts of individual effects that are sig.
# g <- with(dat7,
#      contour_funnel(x = g2, 
#                     vi = v_g2, sig = TRUE))

flextable(g) %>%
  autofit() %>% set_caption("Statistically significant individual effects") %>% fontsize(size = 11, part = "all") %>%
  line_spacing(space = .6, part = "all")

7.3 Egger’s test

We also conducted an Egger’s test (Egger, Smith, Schneider, & Minder, 1997) of funnel plot symmetry. Using this test, we examined the extent to which the standard error (as a measure of precision) of the effect sizes collected from the note-taking literature related to the effect sizes’ magnitude. If such a relationship and/or its estimate of intercept, with the latter sometimes referred to as a precision-effect test (PET), rise to statistically significant levels, that could suggest asymmetry (and potentially publication bias) in the funnel plot of effect sizes.

In our case, given that the p-value for the Egger’s test for the relationship in question (b = 0.427, p = 0.784; 95% CI[-2.680, 3.533]) and its estimate of intercept (a = 0.390, p = 0.370; 95% CI[-0.491, 1.271]) are both larger than 0.05, we concluded that our funnel plot is sufficiently symmetric and the likelihood of publication bias in the collected sample of note-taking studies is small with the caveat that the b estimate has a relatively wide CI.

# Eggers test using the same naturally and statistically occurring dependence
ff = rma.mv(g2 ~ SE_egger, V = Vs, 
            random = ~1|study/effect, data = dat7,
            dfs = "contain")

g <- results_rma(ff, drop_rows = 3:7, drop_cols = 9:10, tidy = TRUE)

flextable(dplyr::select(g, -Df)) %>%
  autofit() %>% set_caption("Egger's Test Results") %>% fontsize(size = 11, part = "all") %>%
  line_spacing(space = .6, part = "all")

8 Results of analyses

In this section, we present the results of our analyses in two parts. In the first part, we present the synthesized effects at each time point. As mentioned in the manuscript, results based on a limited number of effects (M) and/or studies (K) should be ignored due to their unreliable nature.

Also presented in the first part is the \(R^2\) test of heterogeneity. \(R^2\) indicates the percentage of change in the total heterogeneity (between- and within the studies) in the true effects of note-taking from a model without any MUTOS moderator (a null model) to that from a model that includes a set of MUTOS moderators of interest.

While necessary, the results presented in the first part may not by themselves immediately translate into evidence-based recommendations. This is because the descriptive (synthesized average effects) and the associated inferential results (CIs and p-values) simply denote how much effect at each measurement occasion exists and if that effect is reliably different from 0 at that point in time.

In the second part, we compare the changes that occurred in learners’ performance from one measurement occasion (baseline) to another (post-test) to specifically measure the potential learning “gains” that might have resulted from note-taking treatments taking into account the methodological differences that differentiate the studies to varying degrees (see Table 2 in the manuscript for more details on moderators).

Because the second part allows us to measure the gains from note-taking across more than one occasion, the results (i.e., synthesized average effects and their associated CIs and p-values) more immediately translate into evidence-based recommendations. To further facilitate such recommendations, in the second part we also measure a minimum expected benefit of using note taking presented in the universal metric of percentages.

8.1 Effects at each time

table_names <-  
  c("Time",
    "Note-Taking Type",
    "Outcome",
    "Measure Type",
    "Input Mode",
    "Material Type",
    "Optional Note-Taking",
    "Proficiency",
    "L1-L2 Orthographic Differences",
    "Region")


# Fitted moderator models stored from the previous section
results <- setNames(readRDS(url("https://github.com/fpqq/w/raw/main/np.rds")), table_names)



moderators_abb_names <-  c(
  "time",
  "treat_grp",
  "outcome",
  "measure",
  "input_mode",
  "material_type",
  "note_option",
  "prof",
  "script_distance",
  "region")


# A list of time and moderators interacting with time
LIST <- c("time", map(moderators_abb_names[-1], c, "time"))


# Count # of studies and effects at each time
effect_no <- setNames(map(LIST, ~effect_count(dat7, study, !!!syms(.), show0=FALSE, na.rm=TRUE, arrange_by="time")), table_names)
rs <- results

invisible(lapply(table_names, \(i){

 
  cat(paste0("\n\n### ", i, "\n"))

      
g <- rs[[i]]$ems


g3 <- rs[[i]]$plot

g4 <- rs[[i]]$R2

if(i!="Overall") print(g3)


print(kable(dplyr::select(cbind(g$table, dplyr::select(effect_no[[i]], `n study`, `n effect`)), -Df) %>% rename(K=`n study`, M=`n effect`),format = "simple", table.attr = "style='width:40%;'",
            caption = paste("3M results for",tolower(i),"categories")) %>%
  kable_styling(bootstrap_options = "bordered",
                full_width = TRUE, font_size = 9.5))


print(kable(g4 %>% rename(`Total Heterogeneity`=`Sigma(total)`, `Between-study Heterogeneity`=`Sigma(study)`, `Within-study Heterogeneity`=`Sigma(effect)`), format = "simple", table.attr = "style='width:40%;'",
            caption = paste("R2 test of heterogeneity for",tolower(i))) %>%
    add_footnote(c("Heterogeneity is in SD unit.",paste("The *p-value* indicates the statistical significance of the MUTOS moderators in the",i, "model *collectively*."))) %>% 
  kable_styling(bootstrap_options = "bordered",
                full_width = TRUE, font_size = 9.5))
  
}))

8.1.1 Time

3M results for time categories
time Mean SE Lower Upper t p-value Sig. K M
baseline -0.198 0.157 -0.528 0.132 -1.261 0.223 16 18
posttest1 0.713 0.124 0.452 0.973 5.747 0.000 *** 25 33
posttest2 0.358 0.272 -0.214 0.930 1.314 0.205 3 4
R2 test of heterogeneity for time
Model Total Heterogeneity Between-study Heterogeneity Within-study Heterogeneity p-value R2
No (M)UTOS 0.661 0.157 0.642
Time 0.438 0.253 0.358 0.001 33.658%

Note: a Heterogeneity is in SD unit. b The p-value indicates the statistical significance of the MUTOS moderators in the Time model collectively.

8.1.2 Note-Taking Type

3M results for note-taking type categories
treat_grp time Mean SE Lower Upper t p-value Sig. K M
1 conventional baseline 0.166 0.300 -0.503 0.834 0.552 0.593 3 4
2 framework notes baseline -0.428 0.285 -1.063 0.206 -1.504 0.163 7 7
3 note-taking instruction baseline -0.133 0.300 -0.802 0.536 -0.444 0.667 5 5
4 vocabulary notebook baseline -0.664 0.540 -1.867 0.538 -1.231 0.246 1 2
5 conventional posttest1 0.441 0.229 -0.070 0.952 1.925 0.083 . 7 12
6 framework notes posttest1 0.868 0.246 0.320 1.417 3.527 0.005 ** 9 9
7 note-taking instruction posttest1 0.844 0.265 0.254 1.434 3.189 0.010 ** 7 7
8 vocabulary notebook posttest1 1.187 0.417 0.259 2.115 2.849 0.017 * 3 5
9 conventional posttest2 0.376 0.348 -0.398 1.150 1.082 0.305 1 2
11 note-taking instruction posttest2 0.438 0.550 -0.787 1.663 0.797 0.444 1 1
12 vocabulary notebook posttest2 0.455 0.726 -1.162 2.072 0.627 0.545 1 1
R2 test of heterogeneity for note-taking type
Model Total Heterogeneity Between-study Heterogeneity Within-study Heterogeneity p-value R2
No (M)UTOS 0.661 0.157 0.642
Note-Taking Type 0.384 0.250 0.291 0.014 41.937%

Note: a Heterogeneity is in SD unit. b The p-value indicates the statistical significance of the MUTOS moderators in the Note-Taking Type model collectively.

8.1.3 Outcome

3M results for outcome categories
outcome time Mean SE Lower Upper t p-value Sig. K M
1 listening baseline -0.269 0.326 -0.987 0.450 -0.823 0.428 6 6
2 miscellaneous baseline -0.134 0.368 -0.943 0.676 -0.364 0.723 3 3
3 reading baseline 0.238 0.476 -0.810 1.286 0.500 0.627 3 3
4 vocabulary baseline -0.181 0.259 -0.751 0.388 -0.701 0.498 5 6
5 listening posttest1 0.453 0.206 0.000 0.905 2.202 0.050 * 9 14
6 miscellaneous posttest1 0.845 0.341 0.096 1.595 2.482 0.030 * 4 4
7 reading posttest1 1.241 0.395 0.372 2.109 3.144 0.009 ** 5 5
8 vocabulary posttest1 0.766 0.218 0.286 1.246 3.510 0.005 ** 8 10
10 miscellaneous posttest2 0.507 0.493 -0.578 1.593 1.028 0.326 1 1
12 vocabulary posttest2 0.355 0.348 -0.412 1.122 1.018 0.331 3 3
R2 test of heterogeneity for outcome
Model Total Heterogeneity Between-study Heterogeneity Within-study Heterogeneity p-value R2
No (M)UTOS 0.661 0.157 0.642
Outcome 0.456 0.166 0.425 0.055 30.929%

Note: a Heterogeneity is in SD unit. b The p-value indicates the statistical significance of the MUTOS moderators in the Outcome model collectively.

8.1.4 Measure Type

3M results for measure type categories
measure time Mean SE Lower Upper t p-value Sig. K M
1 miscellaneous baseline -0.012 0.815 -1.772 1.748 -0.015 0.988 3 3
2 recall baseline 0.008 0.271 -0.577 0.594 0.031 0.976 5 5
3 recognition baseline -0.292 0.213 -0.753 0.169 -1.367 0.195 10 10
4 miscellaneous posttest1 0.470 0.817 -1.295 2.234 0.575 0.575 3 3
5 recall posttest1 1.017 0.250 0.476 1.558 4.063 0.001 ** 7 8
6 recognition posttest1 0.598 0.148 0.278 0.919 4.034 0.001 ** 17 21
8 recall posttest2 0.440 0.500 -0.641 1.521 0.880 0.395 1 1
9 recognition posttest2 0.389 0.349 -0.365 1.142 1.115 0.285 3 3
R2 test of heterogeneity for measure type
Model Total Heterogeneity Between-study Heterogeneity Within-study Heterogeneity p-value R2
No (M)UTOS 0.661 0.157 0.642
Measure Type 0.458 0.133 0.438 0.043 30.651%

Note: a Heterogeneity is in SD unit. b The p-value indicates the statistical significance of the MUTOS moderators in the Measure Type model collectively.

8.1.5 Input Mode

3M results for input mode categories
input_mode time Mean SE Lower Upper t p-value Sig. K M
listening baseline -0.050 0.271 -0.641 0.541 -0.185 0.856 7 8
miscellaneous baseline -0.490 0.354 -1.260 0.281 -1.384 0.192 5 6
reading baseline 0.111 0.380 -0.717 0.939 0.292 0.775 4 4
listening posttest1 0.555 0.214 0.089 1.022 2.592 0.024 * 10 16
miscellaneous posttest1 0.809 0.329 0.093 1.525 2.462 0.030 * 7 9
reading posttest1 0.947 0.281 0.335 1.558 3.371 0.006 ** 8 8
listening posttest2 0.385 0.383 -0.449 1.219 1.005 0.335 1 2
miscellaneous posttest2 0.229 0.758 -1.424 1.881 0.302 0.768 1 1
reading posttest2 0.530 0.603 -0.784 1.845 0.879 0.397 1 1
R2 test of heterogeneity for input mode
Model Total Heterogeneity Between-study Heterogeneity Within-study Heterogeneity p-value R2
No (M)UTOS 0.661 0.157 0.642
Input Mode 0.454 0.255 0.376 0.024 31.298%

Note: a Heterogeneity is in SD unit. b The p-value indicates the statistical significance of the MUTOS moderators in the Input Mode model collectively.

8.1.6 Material Type

3M results for material type categories
material_type time Mean SE Lower Upper t p-value Sig. K M
academic baseline -0.462 0.189 -0.865 -0.058 -2.440 0.028 * 13 14
non-academic baseline 0.402 0.291 -0.217 1.022 1.384 0.187 3 4
academic posttest1 0.661 0.138 0.366 0.955 4.777 0.000 *** 19 25
non-academic posttest1 0.896 0.230 0.405 1.387 3.892 0.001 ** 6 8
academic posttest2 0.203 0.443 -0.740 1.147 0.460 0.652 2 2
non-academic posttest2 0.692 0.351 -0.057 1.441 1.970 0.068 . 1 2
R2 test of heterogeneity for material type
Model Total Heterogeneity Between-study Heterogeneity Within-study Heterogeneity p-value R2
No (M)UTOS 0.661 0.157 0.642
Material Type 0.397 0.185 0.351 0.003 39.952%

Note: a Heterogeneity is in SD unit. b The p-value indicates the statistical significance of the MUTOS moderators in the Material Type model collectively.

8.1.7 Optional Note-Taking

3M results for optional note-taking categories
note_option time Mean SE Lower Upper t p-value Sig. K M
allowed baseline 0.274 0.264 -0.291 0.840 1.040 0.316 5 6
required baseline -0.515 0.207 -0.959 -0.071 -2.488 0.026 * 10 11
allowed posttest1 0.731 0.224 0.250 1.212 3.258 0.006 ** 8 11
required posttest1 0.782 0.165 0.428 1.135 4.742 0.000 *** 16 20
allowed posttest2 0.567 0.345 -0.172 1.306 1.645 0.122 1 2
required posttest2 0.260 0.431 -0.664 1.184 0.603 0.556 2 2
R2 test of heterogeneity for optional note-taking
Model Total Heterogeneity Between-study Heterogeneity Within-study Heterogeneity p-value R2
No (M)UTOS 0.661 0.157 0.642
Note-Taking Option 0.409 0.273 0.305 0.002 38.023%

Note: a Heterogeneity is in SD unit. b The p-value indicates the statistical significance of the MUTOS moderators in the Optional Note-Taking model collectively.

8.1.8 Proficiency

3M results for proficiency categories
prof time Mean SE Lower Upper t p-value Sig. K M
beginner to lower intermediate baseline -0.719 0.306 -1.386 -0.052 -2.350 0.037 * 3 4
high intermediate to advanced baseline 0.425 0.369 -0.380 1.229 1.150 0.273 2 3
intermediate baseline -0.134 0.221 -0.615 0.346 -0.609 0.554 11 11
beginner to lower intermediate posttest1 0.736 0.227 0.241 1.230 3.241 0.007 ** 7 9
high intermediate to advanced posttest1 1.037 0.317 0.346 1.729 3.268 0.007 ** 4 5
intermediate posttest1 0.624 0.178 0.236 1.012 3.508 0.004 ** 14 19
beginner to lower intermediate posttest2 0.292 0.729 -1.295 1.880 0.401 0.695 1 1
high intermediate to advanced posttest2 0.758 0.396 -0.106 1.621 1.912 0.080 . 1 2
intermediate posttest2 0.297 0.558 -0.920 1.513 0.532 0.605 1 1
R2 test of heterogeneity for proficiency
Model Total Heterogeneity Between-study Heterogeneity Within-study Heterogeneity p-value R2
No (M)UTOS 0.661 0.157 0.642
Proficiency 0.441 0.283 0.337 0.013 33.299%

Note: a Heterogeneity is in SD unit. b The p-value indicates the statistical significance of the MUTOS moderators in the Proficiency model collectively.

8.1.9 L1-L2 Orthographic Differences

3M results for l1-l2 orthographic differences categories
script_distance time Mean SE Lower Upper t p-value Sig. K M
greater baseline -0.251 0.184 -0.641 0.139 -1.363 0.192 13 14
shorter baseline -0.061 0.378 -0.862 0.741 -0.160 0.875 3 4
greater posttest1 0.568 0.151 0.249 0.888 3.769 0.002 ** 20 26
shorter posttest1 1.250 0.299 0.616 1.884 4.177 0.001 *** 5 7
greater posttest2 0.283 0.280 -0.309 0.876 1.014 0.326 3 4
R2 test of heterogeneity for l1-l2 orthographic differences
Model Total Heterogeneity Between-study Heterogeneity Within-study Heterogeneity p-value R2
No (M)UTOS 0.661 0.157 0.642
L1-L2 Orthographic Distance 0.473 0.314 0.354 0.002 28.416%

Note: a Heterogeneity is in SD unit. b The p-value indicates the statistical significance of the MUTOS moderators in the L1-L2 Orthographic Differences model collectively.

8.1.10 Region

3M results for region categories
region time Mean SE Lower Upper t p-value Sig. K M
1 East Asia baseline -0.001 0.224 -0.483 0.480 -0.006 0.995 6 7
3 Middle East baseline -0.192 0.211 -0.646 0.261 -0.910 0.378 10 11
4 East Asia posttest1 0.786 0.226 0.300 1.272 3.470 0.004 ** 6 7
5 Europe/North America posttest1 0.248 0.268 -0.327 0.822 0.924 0.371 4 8
6 Middle East posttest1 0.885 0.171 0.518 1.252 5.173 0.000 *** 15 18
7 East Asia posttest2 0.465 0.343 -0.271 1.201 1.355 0.197 1 2
9 Middle East posttest2 0.371 0.455 -0.605 1.348 0.815 0.429 2 2
R2 test of heterogeneity for region
Model Total Heterogeneity Between-study Heterogeneity Within-study Heterogeneity p-value R2
No (M)UTOS 0.661 0.157 0.642
Region 0.395 0.107 0.380 0.010 40.203%

Note: a Heterogeneity is in SD unit. b The p-value indicates the statistical significance of the MUTOS moderators in the Region model collectively.

8.2 Learning Gains

rs <- results

invisible(lapply(table_names, \(i){

 
  cat(paste0("\n\n### ", i, "\n"))

      
g <- rs[[i]]$ems

# Effects
gains <- if(i=="Time") contrast_rma(g, list("Gain1(post-test 1 - baseline)" =c(2,-1))) else contrast_rma(g, brief = TRUE)


print(kable(dplyr::select(gains$table, -Df),format = "simple", table.attr = "style='width:40%;'",
            caption = paste("Learning gains for", tolower(i))) %>%
  kable_styling(bootstrap_options = "bordered",
                full_width = TRUE, font_size = 9.5))



if(i!="Time") gain_dif <- contrast_rma(g, gain_dif = TRUE, brief = TRUE, gain_dif_type = "same")

if(i!="Time") print(kable(dplyr::select(gain_dif$table, -Df),format = "simple", table.attr = "style='width:40%;'",
            caption = paste("Differences in learning gains for", tolower(i))) %>%
  kable_styling(bootstrap_options = "bordered",
                full_width = TRUE, font_size = 9.5))


# Percentages
gain_prob <- prob_rma(gains, gain=TRUE, target_effect=.2)

print(kable(gain_prob,format = "simple", table.attr = "style='width:40%;'",
            caption = paste("Minimum learning gain percentage for", tolower(i))) %>%
  kable_styling(bootstrap_options = "bordered",
                full_width = TRUE, font_size = 9.5))
  
}))

8.2.1 Time

Learning gains for time
Contrast Estimate SE Lower Upper t p-value Sig.
Gain1(post-test 1 - baseline) 0.911 0.167 0.561 1.261 5.465 0.000 ***
Minimum learning gain percentage for time
Term Target_Effect Probability Min Max
Gain1(post-test 1 - baseline) 0.2 or larger 79.97% 63.39% 94.98%

8.2.2 Note-Taking Type

Learning gains for note-taking type
Contrast Estimate SE Lower Upper t p-value Sig.
1 Gain1(conventional) 0.275 0.251 -0.284 0.835 1.096 0.299
2 Gain2(conventional) 0.210 0.315 -0.492 0.913 0.667 0.520
3 Gain1(framework notes) 1.297 0.278 0.677 1.917 4.659 0.001 ***
5 Gain1(note-taking instruction) 0.977 0.287 0.338 1.616 3.407 0.007 **
6 Gain2(note-taking instruction) 0.571 0.541 -0.634 1.776 1.056 0.316
7 Gain1(vocabulary notebook) 1.852 0.468 0.809 2.894 3.958 0.003 **
8 Gain2(vocabulary notebook) 1.119 0.832 -0.733 2.972 1.346 0.208
Differences in learning gains for note-taking type
Contrast Estimate SE Lower Upper t p-value Sig.
1 Gain1(conventional) - Gain1(framework notes) -1.021 0.373 -1.852 -0.190 -2.738 0.021 *
2 Gain1(conventional) - Gain1(note-taking instruction) -0.702 0.382 -1.553 0.149 -1.838 0.096 .
3 Gain1(conventional) - Gain1(vocabulary notebook) -1.576 0.532 -2.761 -0.391 -2.962 0.014 *
5 Gain2(conventional) - Gain2(note-taking instruction) -0.361 0.627 -1.757 1.036 -0.576 0.577
6 Gain2(conventional) - Gain2(vocabulary notebook) -0.909 0.889 -2.890 1.072 -1.023 0.331
7 Gain1(framework notes) - Gain1(note-taking instruction) 0.319 0.398 -0.568 1.207 0.802 0.441
8 Gain1(framework notes) - Gain1(vocabulary notebook) -0.555 0.539 -1.757 0.647 -1.029 0.328
11 Gain1(note-taking instruction) - Gain1(vocabulary notebook) -0.874 0.546 -2.090 0.341 -1.603 0.140
12 Gain2(note-taking instruction) - Gain2(vocabulary notebook) -0.548 1.000 -2.777 1.680 -0.548 0.596
Minimum learning gain percentage for note-taking type
Term Target_Effect Probability Min Max
Gain1(conventional) 0.2 or larger 53.92% 31.80% 91.65%
Gain2(conventional) 0.2 or larger 50.52% 24.93% 93.96%
Gain1(framework notes) 0.2 or larger 92.48% 67.96% 99.99%
Gain1(note-taking instruction) 0.2 or larger 84.58% 55.37% 99.90%
Gain2(note-taking instruction) 0.2 or larger 68.66% 20.74% 99.97%
Gain1(vocabulary notebook) 0.2 or larger 98.48% 72.43% 100.00%
Gain2(vocabulary notebook) 0.2 or larger 88.58% 18.08% 100.00%

8.2.3 Outcome

Learning gains for outcome
Contrast Estimate SE Lower Upper t p-value Sig.
1 Gain1(listening) 0.721 0.354 -0.058 1.501 2.036 0.067 .
3 Gain1(miscellaneous) 0.979 0.444 0.001 1.957 2.204 0.050 *
4 Gain2(miscellaneous) 0.641 0.585 -0.647 1.929 1.095 0.297
5 Gain1(reading) 1.002 0.481 -0.056 2.061 2.085 0.061 .
7 Gain1(vocabulary) 0.947 0.281 0.328 1.567 3.366 0.006 **
8 Gain2(vocabulary) 0.536 0.399 -0.342 1.414 1.343 0.206
Differences in learning gains for outcome
Contrast Estimate SE Lower Upper t p-value Sig.
1 Gain1(listening) - Gain1(miscellaneous) -0.258 0.564 -1.500 0.983 -0.458 0.656
2 Gain1(listening) - Gain1(reading) -0.281 0.601 -1.604 1.041 -0.468 0.649
3 Gain1(listening) - Gain1(vocabulary) -0.226 0.450 -1.216 0.763 -0.503 0.625
7 Gain1(miscellaneous) - Gain1(reading) -0.023 0.656 -1.467 1.420 -0.035 0.972
8 Gain1(miscellaneous) - Gain1(vocabulary) 0.032 0.524 -1.122 1.186 0.061 0.953
10 Gain2(miscellaneous) - Gain2(vocabulary) 0.105 0.698 -1.433 1.642 0.150 0.883
11 Gain1(reading) - Gain1(vocabulary) 0.055 0.558 -1.173 1.283 0.099 0.923
Minimum learning gain percentage for outcome
Term Target_Effect Probability Min Max
Gain1(listening) 0.2 or larger 71.40% 41.08% 96.60%
Gain1(miscellaneous) 0.2 or larger 80.09% 43.09% 99.31%
Gain2(miscellaneous) 0.2 or larger 68.38% 22.94% 99.23%
Gain1(reading) 0.2 or larger 80.78% 41.14% 99.55%
Gain1(vocabulary) 0.2 or larger 79.11% 54.46% 97.24%
Gain2(vocabulary) 0.2 or larger 64.22% 31.78% 95.57%

8.2.4 Measure Type

Learning gains for measure type
Contrast Estimate SE Lower Upper t p-value Sig.
1 Gain1(miscellaneous) 0.482 0.894 -1.450 2.414 0.539 0.599
3 Gain1(recall) 1.009 0.335 0.286 1.732 3.014 0.010 **
4 Gain2(recall) 0.432 0.553 -0.762 1.626 0.781 0.449
5 Gain1(recognition) 0.890 0.240 0.372 1.408 3.712 0.003 **
6 Gain2(recognition) 0.681 0.390 -0.161 1.522 1.747 0.104
Differences in learning gains for measure type
Contrast Estimate SE Lower Upper t p-value Sig.
1 Gain1(miscellaneous) - Gain1(recall) -0.527 0.955 -2.590 1.536 -0.552 0.590
2 Gain1(miscellaneous) - Gain1(recognition) -0.408 0.926 -2.409 1.592 -0.441 0.666
5 Gain1(recall) - Gain1(recognition) 0.119 0.406 -0.759 0.996 0.292 0.775
6 Gain2(recall) - Gain2(recognition) -0.249 0.666 -1.687 1.190 -0.374 0.715
Minimum learning gain percentage for measure type
Term Target_Effect Probability Min Max
Gain1(miscellaneous) 0.2 or larger 61.83% 7.54% 99.87%
Gain1(recall) 0.2 or larger 80.62% 52.98% 98.14%
Gain2(recall) 0.2 or larger 59.78% 20.11% 97.38%
Gain1(recognition) 0.2 or larger 76.94% 55.95% 94.98%
Gain2(recognition) 0.2 or larger 69.63% 37.66% 96.39%

8.2.5 Input Mode

Learning gains for input mode
Contrast Estimate SE Lower Upper t p-value Sig.
Gain1(listening) 0.606 0.260 0.039 1.173 2.327 0.038 *
Gain2(listening) 0.435 0.371 -0.374 1.244 1.172 0.264
Gain1(miscellaneous) 1.298 0.291 0.665 1.932 4.467 0.001 ***
Gain2(miscellaneous) 0.718 0.792 -1.008 2.444 0.907 0.382
Gain1(reading) 0.836 0.366 0.039 1.632 2.285 0.041 *
Gain2(reading) 0.419 0.623 -0.939 1.777 0.673 0.514
Differences in learning gains for input mode
Contrast Estimate SE Lower Upper t p-value Sig.
Gain1(listening) - Gain1(miscellaneous) -0.693 0.388 -1.539 0.154 -1.783 0.100 .
Gain1(listening) - Gain1(reading) -0.230 0.447 -1.204 0.744 -0.514 0.617
Gain2(listening) - Gain2(miscellaneous) -0.283 0.870 -2.179 1.613 -0.325 0.750
Gain2(listening) - Gain2(reading) 0.016 0.727 -1.568 1.600 0.022 0.983
Gain1(miscellaneous) - Gain1(reading) 0.463 0.467 -0.554 1.479 0.992 0.341
Gain2(miscellaneous) - Gain2(reading) 0.299 1.017 -1.916 2.514 0.294 0.774
Minimum learning gain percentage for input mode
Term Target_Effect Probability Min Max
Gain1(listening) 0.2 or larger 68.02% 44.19% 93.73%
Gain2(listening) 0.2 or larger 60.68% 30.12% 94.99%
Gain1(miscellaneous) 0.2 or larger 89.74% 66.35% 99.68%
Gain2(miscellaneous) 0.2 or larger 72.49% 13.64% 99.98%
Gain1(reading) 0.2 or larger 76.84% 44.19% 98.79%
Gain2(reading) 0.2 or larger 59.97% 15.06% 99.35%

8.2.6 Material Type

Learning gains for material type
Contrast Estimate SE Lower Upper t p-value Sig.
Gain1(academic) 1.122 0.201 0.694 1.550 5.591 0.000 ***
Gain2(academic) 0.665 0.459 -0.312 1.642 1.450 0.168
Gain1(non-academic) 0.494 0.287 -0.117 1.104 1.722 0.106
Gain2(non-academic) 0.289 0.363 -0.484 1.063 0.798 0.437
Differences in learning gains for material type
Contrast Estimate SE Lower Upper t p-value Sig.
Gain1(academic) - Gain1(non-academic) 0.629 0.352 -0.121 1.378 1.788 0.094 .
Gain2(academic) - Gain2(non-academic) 0.376 0.586 -0.872 1.624 0.641 0.531
Minimum learning gain percentage for material type
Term Target_Effect Probability Min Max
Gain1(academic) 0.2 or larger 86.45% 67.87% 98.87%
Gain2(academic) 0.2 or larger 71.06% 31.52% 99.26%
Gain1(non-academic) 0.2 or larger 63.72% 38.29% 93.66%
Gain2(non-academic) 0.2 or larger 54.23% 26.02% 92.75%

8.2.7 Optional Note-Taking

Learning gains for optional note-taking
Contrast Estimate SE Lower Upper t p-value Sig.
Gain1(allowed) 0.457 0.235 -0.047 0.960 1.946 0.072 .
Gain2(allowed) 0.293 0.319 -0.391 0.977 0.918 0.374
Gain1(required) 1.297 0.203 0.861 1.733 6.385 0.000 ***
Gain2(required) 0.775 0.442 -0.173 1.722 1.754 0.101
Differences in learning gains for optional note-taking
Contrast Estimate SE Lower Upper t p-value Sig.
Gain1(allowed) - Gain1(required) -0.840 0.308 -1.501 -0.179 -2.726 0.016 *
Gain2(allowed) - Gain2(required) -0.482 0.544 -1.650 0.685 -0.886 0.391
Minimum learning gain percentage for optional note-taking
Term Target_Effect Probability Min Max
Gain1(allowed) 0.2 or larger 62.89% 40.40% 92.68%
Gain2(allowed) 0.2 or larger 54.74% 28.04% 93.12%
Gain1(required) 0.2 or larger 91.98% 74.23% 99.83%
Gain2(required) 0.2 or larger 76.91% 35.68% 99.82%

8.2.8 Proficiency

Learning gains for proficiency
Contrast Estimate SE Lower Upper t p-value Sig.
Gain1(beginner to lower intermediate) 1.455 0.294 0.814 2.096 4.944 0.000 ***
Gain2(beginner to lower intermediate) 1.011 0.777 -0.681 2.704 1.302 0.217
Gain1(high intermediate to advanced) 0.613 0.326 -0.098 1.324 1.877 0.085 .
Gain2(high intermediate to advanced) 0.333 0.363 -0.458 1.125 0.917 0.377
Gain1(intermediate) 0.758 0.241 0.233 1.284 3.143 0.008 **
Gain2(intermediate) 0.431 0.567 -0.804 1.666 0.760 0.462
Differences in learning gains for proficiency
Contrast Estimate SE Lower Upper t p-value Sig.
Gain1(beginner to lower intermediate) - Gain1(high intermediate to advanced) 0.842 0.440 -0.117 1.802 1.912 0.080 .
Gain1(beginner to lower intermediate) - Gain1(intermediate) 0.697 0.376 -0.124 1.517 1.851 0.089 .
Gain2(beginner to lower intermediate) - Gain2(high intermediate to advanced) 0.678 0.856 -1.187 2.543 0.792 0.443
Gain2(beginner to lower intermediate) - Gain2(intermediate) 0.580 0.969 -1.530 2.691 0.599 0.560
Gain1(high intermediate to advanced) - Gain1(intermediate) -0.146 0.406 -1.030 0.739 -0.359 0.726
Gain2(high intermediate to advanced) - Gain2(intermediate) -0.098 0.674 -1.567 1.371 -0.145 0.887
Minimum learning gain percentage for proficiency
Term Target_Effect Probability Min Max
Gain1(beginner to lower intermediate) 0.2 or larger 93.67% 71.86% 99.95%
Gain2(beginner to lower intermediate) 0.2 or larger 83.83% 20.32% 100.00%
Gain1(high intermediate to advanced) 0.2 or larger 69.24% 38.94% 97.41%
Gain2(high intermediate to advanced) 0.2 or larger 56.43% 26.76% 94.53%
Gain1(intermediate) 0.2 or larger 75.15% 51.24% 96.97%
Gain2(intermediate) 0.2 or larger 61.07% 17.21% 99.44%

8.2.9 L1-L2 Orthographic Differences

Learning gains for l1-l2 orthographic differences
Contrast Estimate SE Lower Upper t p-value Sig.
Gain1(greater) 0.819 0.187 0.423 1.216 4.379 0.000 ***
Gain2(greater) 0.534 0.287 -0.075 1.143 1.860 0.081 .
Gain1(shorter) 1.311 0.365 0.537 2.084 3.592 0.002 **
Differences in learning gains for l1-l2 orthographic differences
Contrast Estimate SE Lower Upper t p-value Sig.
Gain1(greater) - Gain1(shorter) -0.492 0.408 -1.357 0.373 -1.205 0.246
Minimum learning gain percentage for l1-l2 orthographic differences
Term Target_Effect Probability Min Max
Gain1(greater) 0.2 or larger 76.91% 58.35% 94.52%
Gain2(greater) 0.2 or larger 65.43% 39.74% 93.12%
Gain1(shorter) 0.2 or larger 90.67% 62.50% 99.85%

8.2.10 Region

Learning gains for region
Contrast Estimate SE Lower Upper t p-value Sig.
1 Gain1(East Asia) 0.787 0.272 0.204 1.370 2.895 0.012 *
2 Gain2(East Asia) 0.466 0.374 -0.337 1.270 1.246 0.233
5 Gain1(Middle East) 1.077 0.230 0.584 1.571 4.683 0.000 ***
6 Gain2(Middle East) 0.563 0.477 -0.461 1.588 1.180 0.258
Differences in learning gains for region
Contrast Estimate SE Lower Upper t p-value Sig.
2 Gain1(East Asia) - Gain1(Middle East) -0.290 0.356 -1.054 0.474 -0.815 0.429
4 Gain2(East Asia) - Gain2(Middle East) -0.097 0.609 -1.403 1.209 -0.159 0.876
Minimum learning gain percentage for region
Term Target_Effect Probability Min Max
Gain1(East Asia) 0.2 or larger 74.96% 50.15% 96.08%
Gain2(East Asia) 0.2 or larger 61.98% 31.03% 94.62%
Gain1(Middle East) 0.2 or larger 84.27% 63.83% 98.04%
Gain2(Middle East) 0.2 or larger 66.14% 27.11% 98.16%

9 Inclusion Studies

The following provides the studies (k = 27) that were included in the meta-analysis.