Multiple Comparisons and Post-Hoc Tests
2026-01-01
CRD Model:
\(y_{ij} = \mu + \tau_i + \epsilon_{ij}\) or \(y_{ij} = \mu_i + \epsilon_{ij}\)
Where:
Key assumption: Experimental units are homogeneous (no blocking needed)
Significant F-test means:
\[ H_0: \mu_1 = \mu_2 = \cdots = \mu_t \text{ is rejected} \]
At least one mean differs, but which ones?
Two approaches:
Before interpreting ANOVA, always check assumptions:
Laboratory CRD: Effect of orifice diameter on radon release
# Load radon data from GitHub
radon_url <- "https://raw.githubusercontent.com/Jaisai0611/CRD-example/main/EX%203.21.csv"
radon_data <- read.csv(radon_url) |>
pivot_longer(
cols = starts_with("Radon"),
names_to = "sample",
values_to = "radon_pct"
) |>
mutate(
orifice = factor(Orifice.Diameter),
response = radon_pct
) |>
dplyr::select(orifice, sample, response)
# Summary statistics by orifice size
radon_data |>
group_by(orifice) |>
summarize(
n = n(),
mean = mean(response),
sd = sd(response),
.groups = "drop"
) |>
knitr::kable(digits = 1)| orifice | n | mean | sd |
|---|---|---|---|
| 0.37 | 4 | 82.8 | 2.1 |
| 0.51 | 4 | 77.0 | 2.3 |
| 0.71 | 4 | 75.0 | 1.8 |
| 1.02 | 4 | 71.8 | 3.3 |
| 1.4 | 4 | 65.0 | 3.6 |
| 1.99 | 4 | 62.8 | 2.8 |
ggplot(radon_data, aes(x = orifice, y = response, fill = orifice)) +
geom_boxplot(alpha = 0.7) +
geom_jitter(width = 0.15, alpha = 0.5, size = 2.5) +
labs(
title = "Effect of Orifice Diameter on Radon Release",
x = "Orifice Diameter (mm)",
y = "Radon Released (%)",
fill = "Diameter"
) +
scale_fill_viridis_d(option = "plasma") +
theme_minimal() +
theme(
text = element_text(size = 14),
legend.position = "none"
)Figure 2: Radon release percentage by orifice diameter
Fit one-way ANOVA model
ANOVA table
Model coefficients
#>
#> Call:
#> lm(formula = response ~ orifice, data = radon_data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4.75 -2.00 0.25 2.00 4.00
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 82.750 1.355 61.057 < 2e-16 ***
#> orifice0.51 -5.750 1.917 -3.000 0.007685 **
#> orifice0.71 -7.750 1.917 -4.043 0.000762 ***
#> orifice1.02 -11.000 1.917 -5.739 1.93e-05 ***
#> orifice1.4 -17.750 1.917 -9.261 2.87e-08 ***
#> orifice1.99 -20.000 1.917 -10.435 4.62e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.711 on 18 degrees of freedom
#> Multiple R-squared: 0.8955, Adjusted R-squared: 0.8665
#> F-statistic: 30.85 on 5 and 18 DF, p-value: 3.16e-08
Figure 3: Diagnostic plots for radon release CRD
Calculate estimated marginal means (EMMs)
#> orifice emmean SE df lower.CL upper.CL
#> 0.37 82.8 1.36 18 79.9 85.6
#> 0.51 77.0 1.36 18 74.2 79.8
#> 0.71 75.0 1.36 18 72.2 77.8
#> 1.02 71.8 1.36 18 68.9 74.6
#> 1.4 65.0 1.36 18 62.2 67.8
#> 1.99 62.8 1.36 18 59.9 65.6
#>
#> Confidence level used: 0.95
Display with confidence intervals
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = response ~ orifice, data = radon_data)
#>
#> $orifice
#> diff lwr upr p adj
#> 0.51-0.37 -5.75 -11.841234 0.3412336 0.0707511
#> 0.71-0.37 -7.75 -13.841234 -1.6587664 0.0084181
#> 1.02-0.37 -11.00 -17.091234 -4.9087664 0.0002404
#> 1.4-0.37 -17.75 -23.841234 -11.6587664 0.0000004
#> 1.99-0.37 -20.00 -26.091234 -13.9087664 0.0000001
#> 0.71-0.51 -2.00 -8.091234 4.0912336 0.8968057
#> 1.02-0.51 -5.25 -11.341234 0.8412336 0.1153360
#> 1.4-0.51 -12.00 -18.091234 -5.9087664 0.0000841
#> 1.99-0.51 -14.25 -20.341234 -8.1587664 0.0000089
#> 1.02-0.71 -3.25 -9.341234 2.8412336 0.5513482
#> 1.4-0.71 -10.00 -16.091234 -3.9087664 0.0007059
#> 1.99-0.71 -12.25 -18.341234 -6.1587664 0.0000650
#> 1.4-1.02 -6.75 -12.841234 -0.6587664 0.0249971
#> 1.99-1.02 -9.00 -15.091234 -2.9087664 0.0021152
#> 1.99-1.4 -2.25 -8.341234 3.8412336 0.8432736
Figure 4: Estimated marginal means with 95% confidence intervals
Scientific questions can be translated into contrasts.
Example with 4 NPK application methods:
Research questions → Contrasts:
Treatments:
Contrasts:
| Contrast | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| \(C_1\): Control vs NPK | +3 | -1 | -1 | -1 |
| \(C_2\): Jan vs Apr | 0 | +1 | +1 | -2 |
| \(C_3\): Plow vs Broadcast | 0 | +1 | -1 | 0 |
Rules for contrast coefficients:
# Create simulated but realistic NPK application data
set.seed(2026)
npk_data <- data.frame(
treatment = factor(rep(c("Control", "Jan_Plow", "Jan_Broadcast", "Apr_Broadcast"), each = 8),
levels = c("Control", "Jan_Plow", "Jan_Broadcast", "Apr_Broadcast")
),
block = factor(rep(1:8, 4)),
yield = c(
rnorm(8, 45, 5), # Control: lower yield
rnorm(8, 62, 5), # Jan plowed: best
rnorm(8, 58, 5), # Jan broadcast: good
rnorm(8, 55, 6) # Apr broadcast: moderate
)
)
# Summary by treatment
npk_data |>
group_by(treatment) |>
summarize(
n = n(),
mean = mean(yield),
sd = sd(yield),
.groups = "drop"
) |>
knitr::kable(digits = 1)| treatment | n | mean | sd |
|---|---|---|---|
| Control | 8 | 41.6 | 4.7 |
| Jan_Plow | 8 | 60.0 | 5.4 |
| Jan_Broadcast | 8 | 58.2 | 4.5 |
| Apr_Broadcast | 8 | 59.3 | 5.6 |
ggplot(npk_data, aes(x = treatment, y = yield, fill = treatment)) +
geom_boxplot(alpha = 0.7) +
geom_jitter(width = 0.15, alpha = 0.5, size = 2.5) +
labs(
title = "Effect of NPK Application Method on Crop Yield",
x = "Treatment",
y = "Yield (bushels/acre)",
fill = "Treatment"
) +
scale_fill_viridis_d(option = "plasma") +
theme_minimal() +
theme(
text = element_text(size = 14),
axis.text.x = element_text(angle = 30, hjust = 1),
legend.position = "none"
)Figure 5: Crop yield by NPK application method
# Fit CRD model
model_npk <- lm(yield ~ treatment, data = npk_data)
# Get estimated marginal means
emm_npk <- emmeans(model_npk, ~treatment)
# Define contrasts as a list
contrast_list <- list(
"Control vs NPK" = c(3, -1, -1, -1) / 3, # Divide to get average
"Jan vs Apr" = c(0, 1, 1, -2) / 2, # Compare averages
"Plow vs Broadcast" = c(0, 1, -1, 0) # Simple comparison
)
# Test contrasts
contrast_results <- contrast(emm_npk, contrast_list)
print(contrast_results)#> contrast estimate SE df t.ratio p.value
#> Control vs NPK -17.580 2.07 28 -8.475 <.0001
#> Jan vs Apr -0.205 2.20 28 -0.093 0.9265
#> Plow vs Broadcast 1.847 2.54 28 0.727 0.4734
#> contrast estimate SE df lower.CL upper.CL
#> Control vs NPK -17.580 2.07 28 -21.83 -13.33
#> Jan vs Apr -0.205 2.20 28 -4.71 4.30
#> Plow vs Broadcast 1.847 2.54 28 -3.36 7.05
#>
#> Confidence level used: 0.95
# Plot contrasts with confidence intervals
contrast_df <- confint(contrast_results) |> as.data.frame()
if (!all(c("lower.CL", "upper.CL") %in% names(contrast_df))) {
contrast_df <- contrast_df |>
mutate(
lower.CL = estimate - 1.96 * SE,
upper.CL = estimate + 1.96 * SE
)
}
contrast_df |>
ggplot(aes(x = contrast, y = estimate)) +
geom_point(size = 4, color = "steelblue") +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
width = 0.2, linewidth = 1.2, color = "steelblue"
) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red", linewidth = 0.8) +
coord_flip() +
labs(
title = "Planned Contrasts: NPK Application Methods",
x = "Contrast",
y = "Effect Size (bushels/acre)",
caption = "Error bars show 95% confidence intervals. Red line at zero = no effect."
) +
theme_minimal() +
theme(text = element_text(size = 13))Figure 6: Contrast estimates with 95% confidence intervals
Use post-hoc tests when:
Problem: Familywise Error Rate
\[ \alpha_{familywise} = 1 - (1 - \alpha)^k \]
where \(k\) = number of comparisons
Example: 5 treatments → \({5 \choose 2} = 10\) pairwise comparisons
Terminology: Familywise error rate and experimentwise error rate are the same thing—the probability of making at least one Type I error across all comparisons in the experiment.
Most common post-hoc test for all pairwise comparisons
\[ HSD = q_{\alpha, t, df_{error}} \times \sqrt{\frac{MS_{Error}}{n}} \]
Properties:
Pairwise comparisons with Tukey adjustment
#> contrast estimate SE df t.ratio p.value
#> Control - Jan_Plow -18.435 2.54 28 -7.256 <.0001
#> Control - Jan_Broadcast -16.589 2.54 28 -6.530 <.0001
#> Control - Apr_Broadcast -17.717 2.54 28 -6.974 <.0001
#> Jan_Plow - Jan_Broadcast 1.847 2.54 28 0.727 0.8855
#> Jan_Plow - Apr_Broadcast 0.718 2.54 28 0.283 0.9919
#> Jan_Broadcast - Apr_Broadcast -1.128 2.54 28 -0.444 0.9702
#>
#> P value adjustment: tukey method for comparing a family of 4 estimates
More intuitive way to present multiple comparisons:
| treatment | emmean | SE | df | lower.CL | upper.CL | .group | |
|---|---|---|---|---|---|---|---|
| 1 | Control | 41.6 | 1.8 | 28 | 36.8 | 46.4 | a |
| 3 | Jan_Broadcast | 58.2 | 1.8 | 28 | 53.4 | 63.0 | b |
| 4 | Apr_Broadcast | 59.3 | 1.8 | 28 | 54.5 | 64.1 | b |
| 2 | Jan_Plow | 60.0 | 1.8 | 28 | 55.3 | 64.8 | b |
Reading CLDs: Treatments that share a letter are not significantly different. Treatments with completely different letters are significantly different.
For example:
Compact Letter Display Table:
| treatment | emmean | SE | df | lower.CL | upper.CL | .group | |
|---|---|---|---|---|---|---|---|
| 1 | Control | 41.6 | 1.8 | 28 | 36.8 | 46.4 | a |
| 3 | Jan_Broadcast | 58.2 | 1.8 | 28 | 53.4 | 63.0 | b |
| 4 | Apr_Broadcast | 59.3 | 1.8 | 28 | 54.5 | 64.1 | b |
| 2 | Jan_Plow | 60.0 | 1.8 | 28 | 55.3 | 64.8 | b |
How to read CLDs:
# Combine EMMs with CLD letters for plotting
cld_df <- cld_tukey |>
as.data.frame() |>
mutate(treatment = factor(treatment, levels = treatment))
ggplot(cld_df, aes(x = treatment, y = emmean)) +
geom_point(size = 5, color = "darkgreen") +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
width = 0.3, linewidth = 1.2, color = "darkgreen"
) +
geom_text(aes(label = str_trim(.group)),
vjust = -1.5, size = 5, fontface = "bold", color = "red"
) +
labs(
title = "NPK Treatment Effects with Tukey Groupings",
x = "Treatment",
y = "Estimated Mean Yield (bushels/acre)"
) +
scale_y_continuous(limits = c(40, 72)) +
theme_minimal() +
theme(
text = element_text(size = 12),
axis.text.x = element_text(angle = 30, hjust = 1)
)Use when: You have a control and want to compare all treatments to it
\[ d = t_{Dunnett} \times \sqrt{\frac{2 \times MS_{Error}}{n}} \]
Advantages over Tukey:
Common in agriculture: Herbicide trials, variety comparisons to standard
| contrast | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|
| Jan_Plow - Control | 18.44 | 2.54 | 28 | 7.26 | 0 |
| Jan_Broadcast - Control | 16.59 | 2.54 | 28 | 6.53 | 0 |
| Apr_Broadcast - Control | 17.72 | 2.54 | 28 | 6.97 | 0 |
| contrast | estimate | SE | df | lower.CL | upper.CL |
|---|---|---|---|---|---|
| Jan_Plow - Control | 18.44 | 2.54 | 28 | 12.09 | 24.78 |
| Jan_Broadcast - Control | 16.59 | 2.54 | 28 | 10.24 | 22.93 |
| Apr_Broadcast - Control | 17.72 | 2.54 | 28 | 11.37 | 24.06 |
Modern tidyverse-friendly approach using rstatix:
| term | .y. | group1 | group2 | statistic | p | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|
| treatment | yield | Control | Jan_Plow | -7.256 | 0.000 | 0.000 | **** |
| treatment | yield | Control | Jan_Broadcast | -6.530 | 0.000 | 0.000 | **** |
| treatment | yield | Control | Apr_Broadcast | -6.974 | 0.000 | 0.000 | **** |
| treatment | yield | Jan_Plow | Jan_Broadcast | 0.727 | 0.473 | 0.979 | ns |
| treatment | yield | Jan_Plow | Apr_Broadcast | 0.283 | 0.779 | 1.000 | ns |
| treatment | yield | Jan_Broadcast | Apr_Broadcast | -0.444 | 0.660 | 0.998 | ns |
Tip
Why rstatix? Integrates seamlessly with dplyr pipes and returns tidy data frames (not emmGrid objects). Perfect for automated reporting and further data manipulation.
Even simpler for pairwise comparisons:
| group1 | group2 | estimate | conf.low | conf.high | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|
| Control | Jan_Plow | 18.44 | 11.50 | 25.37 | 0.00 | **** |
| Control | Jan_Broadcast | 16.59 | 9.65 | 23.53 | 0.00 | **** |
| Control | Apr_Broadcast | 17.72 | 10.78 | 24.65 | 0.00 | **** |
| Jan_Plow | Jan_Broadcast | -1.85 | -8.78 | 5.09 | 0.89 | ns |
| Jan_Plow | Apr_Broadcast | -0.72 | -7.65 | 6.22 | 0.99 | ns |
| Jan_Broadcast | Apr_Broadcast | 1.13 | -5.81 | 8.06 | 0.97 | ns |
What rstatix provides: Pairwise comparisons with p-values and significance stars (*, **, ***).
Key advantages:
Important: For CLDs, use emmeans::cld() as shown in the previous slide.
Complete tidyverse + ggpubr workflow:
# Step 1: Run statistical test using rstatix
pwc <- npk_data |>
emmeans_test(yield ~ treatment, p.adjust.method = "tukey")
# Step 2: Prepare positions for brackets
pwc <- pwc |> add_xy_position(x = "treatment")
# Step 3: Create plot with automated significance brackets
ggplot(npk_data, aes(x = treatment, y = yield, fill = treatment)) +
geom_boxplot(alpha = 0.7, outlier.shape = NA) +
geom_jitter(width = 0.15, alpha = 0.5, size = 2) +
# Step 4: Add significance brackets automatically
stat_pvalue_manual(pwc,
hide.ns = TRUE, label = "p.adj.signif",
tip.length = 0.01, step.increase = 0.05
) +
labs(
title = "NPK Treatment Effects with Automated Significance Testing",
subtitle = "Brackets show significant pairwise differences (Tukey HSD, α = 0.05)",
x = "Treatment",
y = "Yield (bushels/acre)",
fill = "Treatment"
) +
scale_fill_viridis_d(option = "plasma") +
theme_minimal() +
theme(
text = element_text(size = 13),
axis.text.x = element_text(angle = 30, hjust = 1),
legend.position = "none"
)Figure 8: NPK treatment comparison with ggpubr automated annotations
Note
ggpubr advantage: Automatically adds significance brackets to plots—perfect for presentations and publications. Uses rstatix output directly!
# Compare different adjustment methods
methods_comparison <- list(
Tukey = as.data.frame(pairs(emm_npk, adjust = "tukey")),
Bonferroni = as.data.frame(pairs(emm_npk, adjust = "bonferroni")),
None = as.data.frame(pairs(emm_npk, adjust = "none")) # Fisher's LSD
) |>
bind_rows(.id = "method")
# Plot p-values
methods_comparison |>
as.data.frame() |>
ggplot(aes(x = contrast, y = p.value, color = method, group = method)) +
geom_point(size = 3, position = position_dodge(width = 0.5)) +
geom_hline(yintercept = 0.05, linetype = "dashed", color = "red") +
coord_flip() +
labs(
title = "P-values Across Multiple Comparison Methods",
subtitle = "Red line shows α = 0.05 threshold",
x = "Pairwise Comparison",
y = "P-value",
color = "Adjustment"
) +
scale_color_viridis_d(option = "plasma", end = 0.9) +
theme_minimal() +
theme(text = element_text(size = 12))Figure 9: P-values across multiple comparison methods
Planned Contrasts
Example: Control vs. treatments, dose-response trends
Tukey HSD
Dunnett’s Test
Tip
Best practice: Plan contrasts during experimental design. Use post-hoc only for exploratory follow-up.
Modern integrated approach using the easystats ecosystem:
#> Parameter | Coefficient | SE | 95% CI | t(28) | p
#> --------------------------------------------------------------------------------
#> (Intercept) | 41.60 | 1.80 | [37.92, 45.28] | 23.16 | < .001
#> treatment [Jan_Plow] | 18.44 | 2.54 | [13.23, 23.64] | 7.26 | < .001
#> treatment [Jan_Broadcast] | 16.59 | 2.54 | [11.38, 21.79] | 6.53 | < .001
#> treatment [Apr_Broadcast] | 17.72 | 2.54 | [12.51, 22.92] | 6.97 | < .001# Calculate effect sizes - proportion of variance explained by treatment
# η² (eta-squared): Proportion of total variance explained (biased, overestimates)
# ω² (omega-squared): Less biased estimate, adjusts for sample size
# ε² (epsilon-squared): Alternative unbiased estimate
# All range from 0 (no effect) to 1 (treatment explains all variance)
eta_squared(model_npk, partial = TRUE, ci = 0.95)Effect Size Interpretation:
Recommendation: Report ω² (omega-squared) as it’s less biased than η² for making inferences about population effects.
Generate publication-ready text automatically:
#> We fitted a linear model (estimated using OLS) to predict yield with treatment
#> (formula: yield ~ treatment). The model explains a statistically significant
#> and substantial proportion of variance (R2 = 0.72, F(3, 28) = 24.12, p < .001,
#> adj. R2 = 0.69). The model's intercept, corresponding to treatment = Control,
#> is at 41.60 (95% CI [37.92, 45.28], t(28) = 23.16, p < .001). Within this
#> model:
#>
#> - The effect of treatment [Jan_Plow] is statistically significant and positive
#> (beta = 18.44, 95% CI [13.23, 23.64], t(28) = 7.26, p < .001; Std. beta = 2.02,
#> 95% CI [1.45, 2.59])
#> - The effect of treatment [Jan_Broadcast] is statistically significant and
#> positive (beta = 16.59, 95% CI [11.38, 21.79], t(28) = 6.53, p < .001; Std.
#> beta = 1.81, 95% CI [1.25, 2.38])
#> - The effect of treatment [Apr_Broadcast] is statistically significant and
#> positive (beta = 17.72, 95% CI [12.51, 22.92], t(28) = 6.97, p < .001; Std.
#> beta = 1.94, 95% CI [1.37, 2.51])
#>
#> Standardized parameters were obtained by fitting the model on a standardized
#> version of the dataset. 95% Confidence Intervals (CIs) and p-values were
#> computed using a Wald t-distribution approximation.easystats advantages:
Note
For your experiments: Design contrasts during the planning phase, not after seeing data!
Required Reading:
R Documentation:
emmeans package: Lenth, R. V. (2024). emmeans: Estimated Marginal Means. https://CRAN.R-project.org/package=emmeansmultcomp package: Hothorn et al. (2008). Simultaneous inference in general parametric models.rstatix package: Kassambara, A. (2023). rstatix: Pipe-Friendly Framework for Basic Statistical Tests. https://rpkgs.datanovia.com/rstatix/ggpubr package: Kassambara, A. (2023). ggpubr: ‘ggplot2’ Based Publication Ready Plots. https://rpkgs.datanovia.com/ggpubr/easystats ecosystem: Lüdecke et al. (2022). easystats: Framework for Easy Statistical Modeling, Visualization, and Reporting. https://easystats.github.io/easystats/Additional Resources:
agridat package vignettes with agricultural examples: https://kwstat.github.io/agridat/lecture05_week03_script.R
AGST 50104: Experimental Design | Spring 2026