Statistical-Inference/07-anova.qmd at master · ShKlinkenberg/Statistical-Inference · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
editor:
  markdown:
    wrap: 72
---

```{r setup, include=FALSE}
source("R_setup.R")
```

# Analysis of Variance (ANOVA) and Moderation {#sec-anova}

> Key concepts: eta-squared, between-groups variance / explained variance, within-groups
> variance / unexplained variance, *F* test on analysis of variance model, signal to noise ratio, pairwise
> comparisons, post-hoc tests, one-way analysis of variance, two-way
> analysis of variance, balanced design, main effects, moderation,
> interaction effect.

### Summary {.unnumbered}

Imagine you are a health communication researcher. Your goal is to
identify the most effective way of persuading people to accept an
important new vaccine once it becomes available. Based on theory, you
created three versions of a communication campaign aiming to increase
vaccine acceptance: The first version of the campaign uses
**autonomy-supportive language**, that is it addresses people in a way
that respects their freedom to choose, rather than attempting to
pressure or force them to accept the vaccine. The second version of the
campaign uses **controlling language**, that is it attempts to pressure
or command people to accept the vaccine, for instance by using threats
or guilt-appeals. The third version of the campaign uses **neutral
language** that is neither explicitly autonomy-supportive nor
controlling. Version 3 is meant to serve as a control condition. *Which
communication strategy is most effective?*

You suspect that the answer to this question might depend on the
characteristics of the person who is exposed to the campaign, such as
their **health literacy**. Health literacy is the extent to which one is
skilled in finding, understanding, and using health information to make
good decisions about their health. *Are those with higher health
literacy more likely to be persuaded by a different communication
strategy than those with low health literacy?*

To find out, you and your team ran an experiment, randomly assigning
participants who vary in health literacy (high vs. low) to be exposed to
one of the three campaigns. Vaccine acceptance was measured as the main
dependent variable after exposure to a randomly assigned campaign.

To identify the most effective campaign, we first need to compare the
outcome scores (average vaccine acceptance) across more than two groups
(participants who saw the neutral campaign, the autonomy-supportive
campaign or the controlling campaign). To this end, we use analysis of
variance. The null hypothesis tested in analysis of variance states that
all groups have the same average outcome score in the population.

This null hypothesis is similar to the one we test in an
independent-samples *t* test for two groups. With three or more groups,
we must use the variance of the group means (between-groups variance) to
test the null hypothesis. If the between-groups variance is zero, all
group means are equal.

In addition to between-groups variance, we have to take into account the
variance of outcome scores within groups (within-groups variance).
Within-groups variance is related to the fact that we may obtain
different group means even if we draw random samples from populations
with the same means. The ratio of between-groups variance over
within-groups variance gives us the *F* test statistic, which has an *F*
distribution.

Differences in average outcome scores for groups on one independent
variable (usually called *factor* in analysis of variance) are called a
main effect. A main effect represents an overall or average effect of a
factor. If we have only one factor in our model, for instance, the
language used in a vaccination campaign, we apply a one-way analysis of
variance. With two factors, we have a two-way analysis of variance, and
so on.

In our example, we are interested in a second independent variable,
namely health literacy. With two or more factors, we can have
interaction effects in addition to main effects. An interaction effect
is the joint effect of two or more factors on the dependent variable. An
interaction effect is best understood as different effects of one factor
across different groups on another factor. For example, autonomy
supportive language may increase vaccine acceptance for people with high
health literacy but controlling language might work best for people with
low health literacy.

The phenomenon that a variable can have different effects for different
groups on another variable is called moderation. We usually think of one
factor as the predictor (or independent variable) and the other factor
as the moderator. The moderator (e.g., health literacy) changes the
effect of the predictor (e.g., campaign style) on the dependent variable
(e.g., vaccine acceptance).

## Different Means for Three or More Groups

Communication scientists have shown that campaign messages are more
likely to be effective if they are designed based on theoretical
insights [@fishbeinRoleTheoryDeveloping2006]. For instance, Protection
Motivation Theory [@rogersProtectionMotivationTheory1975] suggests that
controlling language including threats can be persuasive by motivating
people to take action to avoid harm and Self-Determination Theory
[@deciIntrinsicMotivationSelfDetermination2013] supports the idea that
autonomy-supportive language exerts impact by fostering a sense of
choice and personal motivation.

Imagine that we want to test if using theory to design language used in
a campaign makes a difference to people's vaccine acceptance. We will be
using either autonomy supportive language or controlling language in the
theory-based campaigns, and we will include a campaign with neutral
language as a control condition.

Let us design an experiment to investigate the effects of the language
used in a campaign (campaign style). We sample a number of people
(participants) and then randomly assign each participant to the campaign
with autonomy supportive language, the campaign with controlling
language, or the campaign with neutral language (control group). Our
independent variable *campaign style* is a factor with three
experimental conditions (autonomy supportive, controlling, neutral).

Our dependent variable is vaccine acceptance, a numeric scale from 1
("Would definitely refuse to get the vaccine") to 10 ("Would definitely
get the vaccine as soon as possible"). We will compare the average
outcome scores among groups. If groups with autonomy supportive or
controlling language have a systematically higher average vaccine
acceptance than the group with neutral language, we conclude that using
theory to design campaign language has a positive effect on vaccine
acceptance.

In statistical terminology, we have a categorical independent variable
(or: factor) and a numerical dependent variable. In experiments, we
usually have a very limited set of treatment levels, so our independent
variable is categorical. Analysis of variance was developed for this
kind of data [@RefWorks:3955], so it is widely used in the context of
experiments.

### Mean differences as effects {#sec-anova-meandiffs}

@fig-anova-means shows the vaccine acceptance scores for twelve
participants in our experiment. Four participants saw a campaign with
autonomy supportive language, four saw a campaign with contolling
language, and four saw a campaign with neutral language.

::: {#fig-anova-means}
```{=html}
<iframe src="https://sharon-klinkenberg.shinyapps.io/anova-means/" width="100%" height="490px" style="border:none;">
</iframe>
```

How do group means relate to effect size?
:::

A group's average score on the dependent variable represents the group's
score level. The group averages in @fig-anova-means tell us for which
campaign style the average vaccine acceptance is higher and for which
campaign style it is lower.

Random assignment of participants to experimental groups (here: which
campaign is shown) creates groups that are, in theory, equal on all
imaginable characteristics except the experimental treatment(s)
administered by the researcher. Participants who saw a campaign with
autonomy supportive language should have more or less the same average
age, knowledge, and so on as participants who saw a campaign with
controlling or neutral language. After all, each experimental group is
just a random sample of participants.

If random assignment was done successfully, differences between group
means can only be caused by the experimental treatment (we will discuss
this in more detail in @sec-confounder). Mean differences are said to
represent the *effect* of experimental treatment in analysis of
variance.

Analysis of variance was developed for the analysis of randomized
experiments, where effects can be interpreted as causal effects. Note,
however, that analysis of variance can also be applied to
non-experimental data. Although mean differences are still called
effects in the latter type of analysis, these do not have to be causal
effects.

In analysis of variance, then, we are simply interested in differences
between group means. The conclusion for a sample is easy: Which groups
have higher average scores on the dependent variable and which groups
have lower scores? A means plot, such as @fig-anova-meansplot, aids
interpretation and helps communicating results to the reader. On
average, participants who saw a campaign with autonomy supportive or
controlling language have higher vaccine acceptance than participants
who saw a campaign with neutral language. In other words, our two
theory-based campaigns were more effective than the neutral control
condition.

```{r}
#| label: fig-anova-meansplot
#| fig-cap: "A means plot showing that average vaccine literacy is higher in the controlling and autonomy supportive language conditions than in the neutral language condition (Error bars = 95% Confidence Intervals). As a reading instruction, effects of language condition are represented by arrows and dashed lines."
#| echo: false
#| warning: false

source('data/simulate_vaccine.R')
sim_data=simulate_data(Neutral_avg=3,Autonomy_avg=6,Control_avg=7)

# Define your color palette
brewercolors <- brewer.pal(n = 8, name = "Set1") %>% setNames(c("Red", "Blue", "Green", "Purple", "Orange", "Yellow", "Brown", "Pink"))

# Summarize the simulated data
d <- sim_data %>%
  group_by(lang_cond) %>%
  summarise(
    accept_av = mean(vacc_acceptance),
    se = sd(vacc_acceptance) / sqrt(n()),
    lower = accept_av - qt(0.975, df = n()-1) * se,
    upper = accept_av + qt(0.975, df = n()-1) * se,
    .groups = "drop"
  ) %>%
  mutate(
    const = "line_group",  # for line connection
    lang_cond=factor(lang_cond, levels=c('Neutral','Autonomy','Control'))
  )

baseline <- d$accept_av[d$lang_cond == "Neutral"]
control <- d$accept_av[d$lang_cond == "Control"]
autonomy <- d$accept_av[d$lang_cond == "Autonomy"]

ggplot(data.frame(d), aes(lang_cond, accept_av)) +
  geom_point(size = 3, color=brewercolors["Blue"]) +
  geom_line(aes(group = const), size = 1, color=brewercolors["Blue"]) +
  geom_segment(aes(x = 1, xend = 3.1, y = baseline, yend = baseline),
               linetype = "dashed", color = "darkgrey") +
  geom_segment(aes(x = 2.1, xend = 2.1, y = baseline, yend = autonomy),
               color = "darkgrey",
               arrow = arrow(length = unit(2,"mm"),
                                 # ends = "both",
                                 type = "closed")) +
   geom_segment(aes(x = 1.7, xend = 2.3, y = autonomy, yend = autonomy),
               linetype = "dashed", color = "darkgrey") +
  geom_label(aes(x = 1.8, y = (baseline + autonomy)/2,
            label = "Autonomy effect",
            hjust = 0), color = "darkgrey",fill='white'
            ) +
  geom_segment(aes(x = 3.1, xend = 3.1, y = baseline, yend = control),
               color = "darkgrey",
               arrow = arrow(length = unit(2,"mm"),
                                 # ends = "both",
                                 type = "closed")) +
  geom_segment(aes(x = 2.7, xend = 3.2, y = control, yend = control),
               linetype = "dashed", color = "darkgrey") +
  geom_label(aes(x = 2.8, y = (baseline + control)/2,
            label = "Control effect",
            hjust = 0), color = "darkgrey",fill='white'
            ) +
  theme_general() +
  scale_y_continuous(limits = c(1, 10), breaks = c(1, 5, 10)) + labs(x = "Language Condition", y = "Average Vaccine Acceptance") +
  # Error bars
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.1, color = brewercolors["Blue"])

rm(d)
```

Effect size in an analysis of variance refers to the overall differences
between group means. We use eta^2^ as effect size, which gives the
proportion of variance in the dependent variable (acceptance of
vaccination information) explained or predicted by the group variable
(experimental condition).

This proportion is informative and precise. If you want to classify the
effect size in more general terms, you should take the square root of
eta^2^ to obtain *eta*. As a measure of association, eta^2^ can be
interpreted with the following rules of thumb:

- 0 – 0.1: non to weak effect
- 0.1 – 0.3: weak effect
- 0.3 – 0.5: medium-sized / moderate effect
- 0.5 – 0.8: strong effect
- 0.8 – 0.99: very strong

### Between-groups variance and within-groups variance {#sec-between-variance}

For a better understanding of eta^2^ and the statistical test of an
analysis of variance model, we have to compare the individual scores to
the group averages and to the overall average. @fig-anova-between adds
overall average acceptance of vaccination information to the plot
(horizontal black line) with participants' scores and average
experimental group scores (coloured horizontal lines).

::: {#fig-anova-between}
```{=html}
<iframe src="https://sharon-klinkenberg.shinyapps.io/anova-between/" width="100%" height="490px" style="border:none;">
</iframe>
```

Which part of score differences tells us about the differences between
groups?
:::

Let us assume that we have measured vaccine acceptance for a sample of
12 participants in our study as depicted in @fig-anova-between. Once we
have our data, we first have a look at the percentage of variance that
is explained, eta^2^. What does it mean if we say that a percentage of
the variance is explained when we interpret eta^2^?

The variance that we want to explain consists of the differences between
the scores of the participants on the dependent variable and the overall
or grand mean of all outcome scores. Remember that a variance measures
deviations from the mean. The dotted black arrows in @fig-anova-between
express the distances between outcome scores and the grand average.
Squaring, summing, and averaging these distances over all observations
gives us the total variance in outcome scores.

The goal of our experiment is to explain why some of our participants'
vaccince acceptance is far above the grand mean (horizontal black line
in @fig-anova-between) while others score a lot lower. We hypothesized
that participants are influenced by the campaign style (language used)
that they have seen. If a certain campaign style has a positive effect,
the average acceptance should be higher for participants confronted with
this campaign style.

If we know the group to which a participant belongs---which language was
used in the campaign they saw---we can use the average outcome score for
the group as the predicted outcome for each group member---their vaccine
acceptance due to the language used in the campaign they saw. The
predicted group scores are represented by the coloured horizontal lines
for group means in @fig-anova-between.

Now what part of the variance in outcome scores (dotted black arrows in
@fig-anova-between) is explained by the experimental treatment? If we
use the experimental treatment as predictor of vaccine acceptance, we
predict that a participant's acceptance equals their group average
(horizontal coloured line) instead of the overall average (horizontal
black line), which we use if we do not take into account the
participant's experimental treatment.

So the difference between the overall average and the group average is
what we predict and explain by the experimental treatment. This
difference is represented by the solid black arrows in
@fig-anova-between. The variance of the predicted scores is obtained if
we average the squared sizes of the solid black arrows for all
participants. This variance is called the *between-groups variance*.

Playing with the group means in @fig-anova-between, you may have noticed
that eta^2^ is high if there are large differences between group means.
In this situation we have high between-groups variance---large black
arrows---so we can predict a lot of the variation in outcome scores
between participants.

In contrast, small differences between group averages allow us to
predict only a small part of the variation in outcome scores. If all
group means are equal, we can predict none of the variation in outcome
scores because the between-groups variance is zero. As we will see in
@sec-anova-model, zero between-groups variance is central to the null
hypothesis in analysis of variance.

The experimental treatment predicts that a participant's vaccine
acceptance equals the average acceptance of the participant's group. It
cannot predict or explain that a participant's vaccine acceptance score
is slightly different from their group mean (the red double-sided arrows
in @fig-anova-between). *Within-groups variance* in outcome scores is
what we cannot predict with our experimental treatment; it is prediction
error. In some SPSS output, it is therefore labeled as "Error".

### *F* test on the model {#sec-anova-model}

Average group scores tell us whether the experimental treatment has
effects within the sample (@sec-anova-meandiffs). If the group who saw a
campaign with controlling language has higher average acceptance of
vaccination information than the group who saw a campaign with neutral
language, we conclude that using controlling language makes a difference
in the sample. But how about the population?

If we want to test whether the difference that we find in the sample
also applies to the population, we use the null hypothesis that all
average outcome scores are equal in the population from which the
samples were drawn. In our example, the null hypothesis states that
vaccine acceptance of people in the population does not differ between
those exposed to a campaign with autonomy supportive language, one with
controlling language, or a campaign with neutral language.

We use the variance in group means as the number that expresses the
differences between group means. If all groups have the same average
outcome score, the between-groups variance is zero. The larger the
differences, the larger the between-groups variance (see
@sec-between-variance).

We cannot just use the between-groups variance as the test statistic
because we have to take into account chance differences between sample
means. Even if we draw different samples from the same population, the
sample means will be different because we draw samples at random. These
sample mean differences are due to chance, they do not reflect true
differences between groups in the population.

We have to correct for chance differences and this is done by taking the
ratio of between-groups variance over within-groups variance. This ratio
gives us the relative size of observed differences between group means
over group mean differences that we expect by chance.

Our test statistic, then, is the ratio of two variances: between-groups
variance and within-groups variance. The *F* distribution approximates
the sampling distribution of the ratio of two variances, so we can use
this probability distribution to test the significance of the group mean
differences we observe in our sample.

Long story short: We test the null hypothesis that all groups have the
same population means in an analysis of variance. But behind the scenes,
we actually test between-groups variance against within-groups variance.
That is why it is called analysis of variance.

### Assumptions for the *F* test in analysis of variance {#sec-anova-assumpt}

There are two important assumptions that we must make if we use the *F*
distribution in analysis of variance: (1) independent samples and (2)
homogeneous population variances.

#### Independent samples

The first assumption is that the groups can be regarded as independent
samples. As in an independent-samples *t* test, it must be possible *in
principle* to draw a separate sample for each group in the analysis.
Because this is a matter of principle instead of how we actually draw
the sample, we have to argue that the assumption is reasonable. We
cannot check the assumption against the data.

Here is an example of an argument that we can make. In an experiment, we
usually draw one sample of participants and, as a next step, we assign
participants randomly to one of the experimental conditions. We could
have easily drawn a separate sample for each experimental group. For
example, we first draw a participant for the first condition: seeing an
autonomy supportive campaign. Next, we draw a participant for the second
condition, e.g., the controlling campaign. The two draws are
independent: whomever we have drawn for the autonomy supportive
condition is irrelevant to whom we draw for the controlling condition.
Therefore, draws are independent and the samples can be regarded as
independent.

Situations where samples cannot be regarded as independent are the same
as in the case of dependent/paired-samples *t* tests (see
@sec-dependentsamples). For example, samples of first and second
observations in a repeated measurement design should not be regarded as
independent samples. Some analysis of variance models can handle
repeated measurements but we do not discuss them here.

#### Homogeneous population variances

The *F* test on the null hypothesis of no effect (the nil) in analysis
of variance assumes that the groups are drawn from the same population.
This implies that they have the same average score on the dependent
variable in the population as well as the same variance of outcome
scores. The null hypothesis tests the equality of population means but
we must assume that the groups have equal dependent variable variances
in the population.

We can use a statistical test to decide whether or not the population
variances are equal (homogeneous). This is Levene's *F* test, which is
also used in combination with independent samples *t* tests. The test's
null hypothesis is that the population variances of the groups are
equal. If we do *not* reject the null hypothesis, we decide that the
assumption of equal population variances is plausible.

The assumption of equal population variances is less important if group
samples are more or less of equal size (a balanced design, see
@sec-balanced). We use a rule of thumb that groups are of equal size if
the size of the largest group is less than 10% (of the largest group)
larger than the size of the smallest group. If this is the case, we do
not care about the assumption of homogeneous population variances.

### Which groups have different average scores?

Analysis of variance tests the null hypothesis of equal population means
but it does not yield confidence intervals for group means. It does not
always tell us which groups score significantly higher or lower.

::: {#fig-anova-posthoc}
```{=html}
<iframe src="https://sharon-klinkenberg.shinyapps.io/anova-posthoc/" width="100%" height="485px" style="border:none;">
</iframe>
```

Which groups have different average outcome scores in the population?
The *p* values belong to independent-samples *t* tests on the means of
two groups.
:::

If the *F* test is statistically significant, we reject the null
hypothesis that all groups have the same population mean on the
dependent variable. In our current example, we reject the null
hypothesis that average vaccine acceptance is equal for people who saw a
campaign with autonomy supportive, controlling or neutral language. In
other words, we *reject* the null hypothesis that the campaign style (or
lagnuage used) does *not* matter to vaccine acceptance.

#### Pairwise comparisons as post-hoc tests

With a statistically significant *F* test for the analysis of variance
model, several questions remain to be answered. Does a controlling
campaign style increase or decrease the acceptance of vaccination
information? Are both campaign styles equally effective? The *F* test
does not provide answers to these questions. We have to compare groups
one by one to see which condition (campaign style) is associated with a
higher level vaccine acceptance.

In a pairwise comparison, we have two groups, for instance, participants
confronted with an autonomy supportive campaign and participants who saw
a campaign with neutral language. We want to compare the two groups on a
numeric dependent variable, namely their vaccine acceptance. An
independent-samples *t* test is appropriate here.

With three groups, we can make three pairs: autonomy supportive versus
controlling, autonomy supportive versus neutral, and controlling versus
neutral. We have to execute three *t* tests on the same data. We already
know that there are most likely differences in average scores, so the
*t* tests are executed after the fact, in Latin *post hoc*. Hence the
name *post-hoc tests*.

Applying more than one test to the same data increases the probability
of finding at least one statistically significant difference even if
there are no differences at all in the population. @sec-cap-chance
discussed this phenomenon as capitalization on chance and it offered a
way to correct for this problem, namely Bonferroni correction. We ought
to apply this correction to the independent-samples *t* tests that we
execute if the analysis of variance *F* test is statistically
significant.

The Bonferroni correction divides the significance level by the number
of tests that we do. In our example, we do three *t* tests on pairs of
groups, so we divide the significance level of five per cent by three.
The resulting significance level for each *t* test is .0167. If a *t*
test's *p* value is below .0167, we reject the null hypothesis, but we
do not reject it otherwise.

#### Two steps in analysis of variance

Analysis of variance, then, consists of two steps. In the first step, we
test the general null hypothesis that all groups have equal average
scores on the dependent variable in the population. If we cannot reject
this null hypothesis, we have too little evidence to conclude that there
are differences between the groups. Our analysis of variance stops here,
although it is recommended to report the confidence intervals of the
group means to inform the reader. Perhaps our sample was just too small
to reject the null hypothesis.

If the *F* test is statistically significant, we proceed to the second
step. Here, we apply independent-samples *t* tests with Bonferroni
correction to each pair of groups to see which groups have significantly
different means. In our example, we would compare the autonomy
supportive and controlling groups to the group with neutral language to
see if a strong campaign style increases acceptance of vaccination
information, and, if so, how much. In addition, we would compare the
autonomy supportive and controlling groups to see if one campaign style
is more effective than the other.

#### Contradictory results

It may happen that the *F* test on the model is statistically
significant but none of the post-hoc tests is statistically significant.
This mainly happens when the *p* value of the *F* test is near .05.
Perhaps the correction for capitalization on chance is too strong; this
is known to be the case with the Bonferroni correction. Alternatively,
the sample can be too small for the post-hoc test. Note that we have
fewer observations in a post-hoc test than in the *F* test because we
only look at two of the groups.

This situation illustrates the limitations of null hypothesis
significance tests (@sec-critical-discussion). Remember that the 5 per
cent significance level remains an arbitrary boundary and statistical
significance depends a lot on sample size. So do not panic if the *F*
and *t* tests have contradictory results.

A statistically significant *F* test tells us that we may be quite
confident that at least two group means are different in the population.
If none of the post-hoc *t* tests is statistically significant, we
should note that it is difficult to pinpoint the differences.
Nevertheless, we should report the sample means of the groups (and their
standard deviations) as well as the confidence intervals of their
differences as reported in the post-hoc test. The two groups that have
most different sample means are most likely to have different population
means.

## One-Way Analysis of Variance in SPSS {#sec-onewaySPSS}

### Instructions

In the video below we walk you through conducting and interpreting a
one-way analysis of variance in SPSS. We will continue using our
vaccination campaign example. The research question that is discussed in
the video below is 'Does the style of language used in a campaign
vaccine acceptance among people seeing the campaign?'. The data set
includes the variable *lang_cond* (language condition), which is a
categorical variable. Participants are either watching a campaign with
autonomy supportive language, controlling language or neutral. You are
researching whether the language style used in the campaign has an
effect on people's vaccine acceptance. Vaccine acceptance has been
measured before (*accept_pre*) and after (*accept_post*) watching the
campaign.

::: {#vid-SPSS1way}
{{< video https://youtu.be/ADkZ650mhEg width="100%" height="315" >}}

Execute one-way analysis of variance in SPSS.
:::

```{r, echo=FALSE, eval=FALSE}
# Execute one-way analysis of variance in SPSS with Analyze > Compare Means > One-Way ANOVA {or use general Linear Model also for one-way? (MCRS probably uses Compare Means because it only disucsses one-way ANOVA) - between & within variance not labeled as such!).
# Goal: association as level differences between three or more groups: Does the language style matter to the level of vaccine acceptance?
# Example: vaccine.sav, outcome is vaccine acceptance (post), predictor (grouping variable) is lang_cond (0, 1).
# Technique: one-way ANOVA.
# SPSS menu: Compare Means ; post hoc: Bonferroni ; options: descriptives, homogeneity of variance test, means plot.
# Paste & Run.
# Check assumptions: F test homogeneous population variances or groups of equal size ; post-hoc t tests: each group more than 30 observations or normally distributed.
# Interpret output: F test on the null hypothesis that all groups have equal population means - point out between groups sum of squares? ; post-hoc t test for pairwise comparison of (population) means ; test result and significance, confidence interval.
```

When we want to research the effect of a categorical variable (campaign
style) on a numerical variable (accept_post), we use a one-way analysis
of variance (see the test selection table @sec-test-selection). You can
get the one-way ANOVA window by clicking
`Analyze > Compare Means > One-Way ANOVA`. The dependent variable is the
vaccine acceptance after watching the campaign, so this variable should
be added to the `Dependent List:`. The independent variable, also called
factor in ANOVA, is the campaign style (i.e. the type of language used)
of the campaign which should be added to `Factor:`. Before running your
ANOVA, we select some additional options. Select `Post Hoc...` and
select the `Bonferroni` correction for multiple testing. Under
`Options...` we select `Descriptive` to obtain the group means and
`Homogeneity of variance test` for Levene's *F*-test to check
assumptions. You would only select bootstrapping if you cannot use the
theoretical approximation for the *F*-distribution (revisit @sec-probmodels if
needed). You can select `Estimate effect size for overall tests` if you
want to obtain eta^2^. Please click `paste` and run the command from
your syntax.

In addition to the SPSS analysis, it is always a good idea to visualize
the results of your analysis. You can do this by creating a means plot
with error bars to clearly show the differences between the groups.
@vid-SPSS1way-visualization below shows you how to create a means plot
in SPSS.

::: {#vid-SPSS1way-visualization}
{{< video https://youtu.be/zVrjqicFVR0 width="100%" height="315" >}}

Visualizing the results of a one-way analysis of variance in SPSS.
:::

In the output we can start interpreting our results.
@vid-SPSS1way-interpretation runs us through the interpretation of the
output. The first table, `Descriptives` provides us with the group
means. Here we can immediately see that the vaccine acceptance is lower
for those who viewed the campaign with neutral language. In this table
we can also check the group sizes, an assumption for ANOVA, the sizes
are fairly equal (45, 49, 49). You can see that the
`Test of Homogeneity of Variances` is not significant indicating the
assumption is met on both fronts (equal group sizes and equal
variances).

::: {#vid-SPSS1way-interpretation}
{{< video https://youtu.be/JDqJbRIiBgk width="100%" height="315" >}}

Interpreting the output of a one-way analysis of variance in SPSS.
:::

To interpret the actual test results, we study the `ANOVA` table. We can
see the between groups and within groups results, you find the *F*-value
in the `Between Groups` row, with a *p*-value below .05 indicating a
significant result. The table below the ANOVA table,
`ANOVA Effect Sizes` provides an insight in the size of the effect -
i.e., the size of the difference between the groups. Eta-squared is the
effect size that we report for ANOVA, as reported in the video all
versions above SPSS v26 show the *partial* eta-squared meaning that you
need to calculate eta-squared by hand. We will walk you through this
process later in this chapter.

As said we have found a significant *F*-value indicating that there is a
significant difference between the groups. We have a factor with three
groups (Autonomy Supportive, Controlling, Neutral) thus we need further
analysis to tell which group(s) differ. For this we inspect the
`Post Hoc Tests`, in the table \``Multiple Comparisons`. Here you can
see the results of several *t*-tests (as you might remember from
@sec-test-selection we can compare two groups on a numerical variable
with a *t*-test). The *p*-values in this table are corrected for
capitalization on chance due to multiple testing (that is why we
selected `Bonferroni` earlier). These results show us that the groups
*Neutral* and *Autonomy Supportive* differ significantly from each
other, and so do the groups *Neutral* and *Controlling* (i.e., both
comparisons have a *p*-value below .05 and the confidence interval does
not include the zero). The difference between *Autonomy Supportive* and
*Controlling* is not significant (i.e., the *p*-value is larger than .05
and the confidence interval includes the zero). It can help to also take
a look at the `Means Plots` which allows you to visualize the effects.

## Different Means for Two Factors

The participants in the experiment do not only differ because they see
different campaign styles. In addition, personal characteristics could
impact how participants perceive each campaign. In our example, we are
particularly interested in participants' health literacy, categorizing
the participants in two groups: low and high health literacy. We can
then ask the question: Does the effect of the language used in campaigns
on vaccine acceptance differ between people with low and those with high
health literacy?

::: {#fig-anova-twoway}
```{=html}
<iframe src="https://sharon-klinkenberg.shinyapps.io/anova-twoway/" width="100%" height="400px" style="border:none;">
</iframe>
```

How do group means inform us about (main) effects in analysis of
variance?
:::

In the preceding section, we have looked at the effect of a single
factor on acceptance of vaccination information, namely, the language
used in a campaign to which participants are exposed. Thus, we take into
account two variables: one independent variable and one dependent
variable. This is an example of *bivariate analysis*.

Usually, however, we expect an outcome to depend on more than one
variable. Vaccine acceptance does not only depend on the language used
in a campaign. It is easy to think of more factors, such as a person's
previous experience with vaccines, their personal health, their beliefs,
and so on.

It is straightforward to include more factors in an analysis of
variance. These can be additional experimental treatments in the context
of an experiment as well as participant characteristics that are not
manipulated by the researcher. For example, we may hypothesize that
people with high health literacy are generally more accepting of
vaccines than people with low health literacy.

### Two-way analysis of variance {#sec-anova2way}

If we use one factor, the analysis is called one-way analysis of
variance. With two factors, it is called two-way analysis of variance,
and with three factors... well, you probably already guessed that name.
An analysis of variance with two or more factors can also be called a
multi-way or factorial analysis of variance.

A two-way analysis of variance using a factor with three levels, for
instance, exposure to three different campaign styles, and a second
factor with two levels, for example, low versus high health literacy, is
called a 3x2 (say: three by two) factorial design.

### Balanced design {#sec-balanced}

In analysis of variance with two or more factors, we aim to have a
*balanced design*. A balanced design in analysis of variance exists when
there is an equal number of observations. It is important that this
equal number of observations exists across every combination of factor
levels. This means that each group being compared in the analysis has
the same number of data points (in our case: participants). In an
experiment, we can ensure a balanced design if we have the same number
of participants in each combination of levels on all factors. In other
words, a factorial design is balanced if we have the same number of
observations in each subgroup. A subgroup contains the participants that
have the same level on both factors just like a cell in a contingency
table.

Balanced designs are important because they lead to more robust results.
In addition, a balanced design indicates statistical independence.
Statistical independence entails that the factors do not influence each
other, meaning that the effect of one factor does not change based on
the level of another factor. In other words, knowing the value or effect
of one factor provides no information about the value of the other
factor. We aim for statistical independence in analysis of variance.

```{r}
#| label: tbl-anova-balanced
#| tbl-cap: "Number of observations per subgroup in a balanced 3x2 factorial design."
#| echo: false
#| warning: false

# Table for a balanced 3x2 factorial design.
# Create data.
df <- data.frame(`High Health Literacy` = rep(2, 3), `Low Health Literacy`= rep(2, 3))
row.names(df) <- c("Autonomy", "Control", "Neutral")
# Display table.
knitr::kable(df, booktabs = TRUE) %>%
  kable_styling(font_size = 12, full_width = F, position = "float_right",
                latex_options = c("HOLD_position"))
# Cleanup.
rm(df)
```

@tbl-anova-balanced shows an example of a balanced 3x2 factorial design.
Each subgroup (cell) contains two participants (cases). Equal
distributions of frequencies across columns or across rows indicate a
balanced design. In the example, we see a balanced design with equal
distributions across columns (and rows). This means that the factors are
statisically independent.

In practice, it may not always be possible to have exactly the same
number of observations for each subgroup. A participant may drop out
from the experiment, a measurement may go wrong, and so on. If the
numbers of observations are more or less the same for all subgroups, the
factors are nearly independent, which is okay. We can use the same rule
of thumb for a balanced design as for the conditions of an *F* test in
analysis of variance: If the size of the smallest subgroup is less than
ten per cent smaller than the size of the largest group, we call a
factorial design balanced.

An example: if @tbl-anova-balanced would read 10-9-9 for both columns,
the largest subgroup exists of 10 participants. Ten per cent of the
largest group (10) is one participant. The smallest group (9) differs
the maximum of 1 participant from the largest group. Thus the design
would still be balanced. If one group would exist of eight participants,
the difference of two participants would exceed the ten per cent of one
participant and therefore be unbalanced.

A balanced design is desired but not necessary. Unbalanced designs can
be analyzed but estimation is more complicated (a problem for the
computer, not for us) and the assumption of equal population variances
for all groups (Levene's *F* test) is more important (a problem for us,
not for the computer) because we do not have equal group sizes. Note
that the requirement of equal group sizes applies to the *subgroups* in
a two-way analysis of variance. With a balanced design, we ensure that
we have the same number of observations in all subgroups, so we are on
the safe side.

### Main effects in two-way analysis of variance {#sec-maineffects}

A two-way analysis of variance tests the effects of both factors on the
dependent variable in one go. It tests the null hypothesis that people
exposed to an autonomy supportive campaign have the same average vaccine
acceptance in the population as people exposed to a controlling campaign
and as those who are exposed to a neutral campaign. It also tests the
null hypothesis that people with low health literacy and people with
high health literacy have the same average vaccine acceptance in the
population.

```{r}
#| label: fig-anova-meansplot2
#| fig-cap: "Means plots for the main effects of language condition and health literacy on vaccine acceptance (Error bars = 95% Confidence Intervals). As a reading instruction, effects of langugae conditions and having high health literacy are represented by arrows and dashed lines."
#| echo: false
#| warning: false

sim_data <- simulate_data(AutonomyHigh_avg=6.5,AutonomyLow_avg=5,
                   ControlHigh_avg=8.5,ControlLow_avg=7,
                   NeutralHigh_avg=4.5,NeutralLow_avg=3)

d <- sim_data %>%
  group_by(lang_cond) %>%
  summarise(
    accept_av = mean(vacc_acceptance),
    se = sd(vacc_acceptance) / sqrt(n()),
    lower = accept_av - qt(0.975, df = n()-1) * se,
    upper = accept_av + qt(0.975, df = n()-1) * se,
    .groups = "drop"
  ) %>%
  mutate(
    const = "line_group",  # for line connection
    lang_cond=factor(lang_cond, levels=c('Neutral','Autonomy','Control'))
  ) %>% arrange(lang_cond)

library(ggplot2)

# Plot for main effect language condition.
ggplot(d, aes(lang_cond, accept_av)) +
    geom_point(size = 3, color=brewercolors["Blue"]) +
    geom_line(aes(group = const), size = 1, color=brewercolors["Blue"]) +
    geom_segment(aes(x = 1, xend = 3.1, y = accept_av[[1]], yend = accept_av[[1]]),
               linetype = "dashed", color = "darkgrey") +
    geom_segment(aes(x = 2.1, xend = 2.1, y = accept_av[[1]], yend = (accept_av[[2]] - 0.1)),
               color = "darkgrey",
               arrow = arrow(length = unit(2,"mm"),
                                 # ends = "both",
                                 type = "closed")) +
  geom_segment(aes(x = 1.7, xend = 2.3, y = accept_av[[2]], yend = accept_av[[2]]),
               linetype = "dashed", color = "darkgrey")+
    geom_label(aes(x = 1.8, y = (accept_av[[1]] + accept_av[[2]])/2,
            label = "Autonomy effect",
            hjust = 0), color = "darkgrey",fill='white'
            ) +
    geom_segment(aes(x = 3.1, xend = 3.1, y = accept_av[[1]], yend = (accept_av[[3]] - 0.1)),
               color = "darkgrey",
               arrow = arrow(length = unit(2,"mm"),
                                 # ends = "both",
                                 type = "closed")) +
  geom_segment(aes(x = 2.7, xend = 3.3, y = accept_av[[3]], yend = accept_av[[3]]),
               linetype = "dashed", color = "darkgrey")+
    geom_label(aes(x = 2.8, y = (accept_av[[1]] + accept_av[[3]])/2,
            label = "Control effect",
            hjust = 0), color = "darkgrey",fill='white'
            ) +
    theme_general() +
    theme(text = element_text(size = 18)) +
    scale_y_continuous(limits = c(1, 10), breaks = c(1, 5, 10)) +
    labs(x = "Language Condition", y = "Average Vaccine Acceptance")+
  # Error bars
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.1, color = brewercolors["Blue"])


# Plot for main effect sex.
d=sim_data %>% group_by(health_literacy) %>%
  summarise(accept_av = mean(vacc_acceptance),
            se = sd(vacc_acceptance) / sqrt(n()),
            lower = accept_av - qt(0.975, df = n()-1) * se,
            upper = accept_av + qt(0.975, df = n()-1) * se,
            .groups = "drop") %>%
  mutate(
    const = "line_group",  # for line connection
    health_literacy = factor(health_literacy,levels=c('low','high'))
    )%>% arrange(health_literacy)

ggplot(d,aes(health_literacy, accept_av)) +
    geom_point(size = 3, color=brewercolors["Blue"]) +
    geom_line(aes(group = const), size = 1, color=brewercolors["Blue"]) +
    geom_segment(aes(x = 1, xend = 2.1, y = accept_av[[1]], yend = accept_av[[1]]),
               linetype = "dashed", color = "darkgrey") +
  geom_segment(aes(x = 1.8, xend = 2.2, y = accept_av[[2]], yend = accept_av[[2]]),
               linetype = "dashed", color = "darkgrey") +
    geom_segment(aes(x = 2.1, xend = 2.1, y = accept_av[[1]], yend = (accept_av[[2]] - 0.1)),
               color = "darkgrey",
               arrow = arrow(length = unit(2,"mm"),
                                 # ends = "both",
                                 type = "closed")) +
    geom_label(aes(x = 1.7, y = (accept_av[[1]] + accept_av[[2]])/2-0.2,
            label = "Effect of high health literacy",
            hjust = 0), color = "darkgrey",fill='white'
            ) +
    theme_general() +
    theme(text = element_text(size = 18)) +
    scale_y_continuous(limits = c(1, 10), breaks = c(1, 5, 10)) +
    labs(x = "Health Literacy", y = "Average Vaccine Acceptance")+
  # Error bars
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.1, color = brewercolors["Blue"])


rm(d)
```

The tested effects are main effects because they represent the effect of
one factor. They express an overall or average difference between the
mean scores of the groups on the dependent variable. The main effect of
the campaign style factor shows the mean differences for campaign groups
if we do not distinguish between low and high health literacy. Likewise,
the main effect for health literacy shows the average difference in
vaccine acceptance between those with low and those with high health
literacy without taking into account the language used in the campaign
to which they were exposed.

We could have used two separate one-way analyses of variance to test the
same effects. Moreover, we could have tested the difference between low
and high health literacy with an independent-samples *t* test. The
results would have been the same (if the design is balanced.) But there
is an important advantage to using a two-way analysis of variance, to
which we turn in the next section.

## Moderation: Group-Level Differences that Depend on Context {#sec-moderationanova}

In the preceding section, we have analyzed the effects both of campaign
style and health literacy on vaccine acceptance. The two main effects
isolate the influence of campaign style on acceptance of vaccination
information from the effect of health literacy and the other way around.

But does campaign style always have the same effect? Even if there is a
main effect of campaign style on vaccine acceptance across all
participants, it is possible that this effect differs when we are
zooming in to specific sub-groups of our sample, for instance comparing
people high and those with low health literacy. In other words, do
people with high and people with low health literacy differ in their
response to different campaign styles? For instance, people with high
health literacy might feel confident in their ability to find and use
health information in order to make their own, informed decisions about
health behaviors like vaccination. These individuals might prefer
autonomy-supportive over controlling communication styles. Individuals
with low health literacy might not show this preference for
autonomy-supportive language as they might feel less confident to use
that autonomy to make complicated decisions on their own. Thus, we could
expect that the effect of campaign style differs between those with high
and low health literacy.

If the effect of a factor is different for different groups on another
factor, the first factor's effect is *moderated* by the second factor.
The phenomenon that effects are moderated is called *moderation*. Both
factors are independent variables. To distinguish between them, we will
henceforth refer to them as the predictor (here: campaign style) and the
moderator (here: health literacy).

With moderation, factors have a combined effect. The context (group
score on the moderator) affects the effect of the other predictor on the
dependent variable. The conceptual diagram for moderation expresses the
effect of the moderator on the effect of the predictor as an arrow
pointing at another arrow. @fig-anova-diagram shows the conceptual
diagram for participant's health literacy moderating the effect of
campaign style on vaccine acceptance.

```{r}
#| label: fig-anova-diagram
#| fig-cap: "Conceptual diagram of moderation."
#| echo: false
#| warning: false


library(ggplot2)
# Create coordinates for the variable names.
variables <- data.frame(x = c(0.3, 0.5, 0.7),
                        y = c(.1, .3, .1),
                        label = c("Language Condition", "Health Literacy", "Vaccine Acceptance"))
ggplot(variables, aes(x, y)) +
  geom_segment(aes(x = x[1], y = y[1], xend = x[3] - 0.05, yend = y[1]), arrow = arrow(length = unit(0.04, "npc"), type = "closed")) +
  geom_segment(aes(x = x[2], y = y[2], xend = x[2], yend = y[1]), arrow = arrow(length = unit(0.04, "npc"), type = "closed")) +
  geom_label(aes(label=label)) +
  coord_cartesian(xlim = c(0.2, 0.8), ylim = c(0, 0.4)) +
  theme_void()