Analysis of Variance Fundamentals
2026-01-01
By the end of this lecture, you should be able to:
Randomization is the foundation of experimental design, but ANOVA is the tool for analysis.
Completely Randomized Design (CRD):
When NOT to use CRD:
Randomization snapshot
All 12 plots receive treatments purely by chance
Research Question:
Do different nitrogen fertilizer rates affect wheat yield?
Experimental Setup:
Null Hypothesis:
\(H_0: \tau_1 = \tau_2 = \tau_3 = \tau_4 = 0\) (No treatment effects; all treatments have the same mean yield)
Alternative Hypothesis:
\(H_a:\) At least one \(\tau_i \neq 0\) (At least one treatment differs)
Using FielDHub package:
# Generate CRD layout
crd_design <- CRD(
t = 4, # Number of treatments
reps = 6, # Replicates per treatment
plotNumber = 1, # Starting plot number
seed = 2026, # Reproducibility
locationName = "Fayetteville",
data = data.frame(
Treatment = c("N0", "N50", "N100", "N150"),
Reps = 6
)
)
# Extract field book
field_book <- crd_design$fieldBook
head(field_book, 10)#> ID LOCATION PLOT REP TREATMENT
#> 1 1 Fayetteville 1 1 N0
#> 2 2 Fayetteville 2 6 N50
#> 3 3 Fayetteville 3 2 N100
#> 4 4 Fayetteville 4 6 N0
#> 5 5 Fayetteville 5 2 N50
#> 6 6 Fayetteville 6 2 N0
#> 7 7 Fayetteville 7 5 N150
#> 8 8 Fayetteville 8 1 N100
#> 9 9 Fayetteville 9 6 N100
#> 10 10 Fayetteville 10 5 N50
# Add grid coordinates for visualization
field_book <- field_book |>
mutate(
row = rep(1:4, each = 6),
col = rep(1:6, times = 4)
)
ggplot(field_book, aes(x = col, y = row, fill = TREATMENT)) +
geom_tile(color = "black", linewidth = 1) +
geom_text(aes(label = PLOT), color = "white", size = 5, fontface = "bold") +
scale_fill_manual(values = c(
"N0" = "#9D2235",
"N50" = "#FF8C42",
"N100" = "#2E8B57",
"N150" = "#4A90E2"
)) +
labs(
title = "CRD Field Layout: Treatments Randomly Assigned to Plots",
x = "Column",
y = "Row",
fill = "Treatment"
) +
theme_void(base_size = 14) +
theme(
legend.position = "bottom",
plot.title = element_text(hjust = 0.5, face = "bold")
) +
coord_equal()Figure 1: Completely Randomized Design: 4 nitrogen treatments × 6 replicates
In plain English:
CRD Model: \[y_{ij} = \mu + \tau_i + \varepsilon_{ij}\]
Where:
Why these assumptions matter:
Important: We’ll test these assumptions next lecture with diagnostic plots. For now, understand what ANOVA requires.
ANOVA asks: Is the variation between treatments larger than the variation within treatments?
Total Variation = Treatment Variation + Error Variation
\[SS_{Total} = SS_{Treatment} + SS_{Error}\]
If treatments differ:
If treatments are the same:
| Source | df | Sum of Squares | Mean Square | F-statistic | p-value |
|---|---|---|---|---|---|
| Treatment | \(t-1\) | \(SS_{Trt}\) | \(MS_{Trt} = \frac{SS_{Trt}}{t-1}\) | \(F = \frac{MS_{Trt}}{MS_{Error}}\) | P(F > observed) |
| Error | \(t(r-1)\) | \(SS_{Error}\) | \(MS_{Error} = \frac{SS_{Error}}{t(r-1)}\) | — | — |
| Total | \(tr-1\) | \(SS_{Total}\) | — | — | — |
Key formulas:
F-statistic: \[F = \frac{MS_{Treatment}}{MS_{Error}} = \frac{\text{Variation between treatments}}{\text{Variation within treatments}}\]
Decision rule:
Remember: A significant F-test tells you “some treatments differ” but NOT which ones. Post-hoc tests (next week) identify specific differences.
set.seed(2026)
# Simulate wheat yield data (kg/ha)
# True means: N0=2400, N50=2650, N100=2900, N150=2950
# Error SD = 150
crd_data <- field_book |>
mutate(
true_mean = case_when(
TREATMENT == "N0" ~ 2400,
TREATMENT == "N50" ~ 2650,
TREATMENT == "N100" ~ 2900,
TREATMENT == "N150" ~ 2950
),
yield = true_mean + rnorm(n(), mean = 0, sd = 150)
) |>
select(PLOT, TREATMENT, REP, yield)
head(crd_data, 8)#> PLOT TREATMENT REP yield
#> 1 1 N0 1 2478.088
#> 2 2 N50 6 2488.046
#> 3 3 N100 2 2920.886
#> 4 4 N0 6 2387.288
#> 5 5 N50 2 2550.004
#> 6 6 N0 2 2022.587
#> 7 7 N150 5 2839.728
#> 8 8 N100 1 2746.982
# Calculate means and SEs
treatment_summary <- crd_data |>
group_by(TREATMENT) |>
summarize(
mean_yield = mean(yield),
se_yield = sd(yield) / sqrt(n()),
.groups = "drop"
)
ggplot(treatment_summary, aes(x = TREATMENT, y = mean_yield)) +
geom_col(fill = "#2E8B57", alpha = 0.8, color = "black") +
geom_errorbar(
aes(
ymin = mean_yield - se_yield,
ymax = mean_yield + se_yield
),
width = 0.2, linewidth = 1
) +
geom_text(aes(label = round(mean_yield, 0)),
vjust = -0.5, size = 5, fontface = "bold"
) +
labs(
title = "Wheat Yield Response to Nitrogen Fertilizer",
x = "Nitrogen Treatment (kg N/ha)",
y = "Mean Yield (kg/ha)"
) +
theme_minimal(base_size = 16) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold")
) +
ylim(0, 3200)Figure 2: Mean wheat yield by nitrogen treatment with standard error bars
#> Analysis of Variance Table
#>
#> Response: yield
#> Df Sum Sq Mean Sq F value Pr(>F)
#> TREATMENT 3 1637754 545918 25.173 5.392e-07 ***
#> Residuals 20 433733 21687
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:
Key values:
Scenario:
A colleague tests 3 fungicide treatments (Control, FungA, FungB) on tomato disease severity (% leaf area infected). They run a CRD with 5 replicates per treatment (15 plants total).
ANOVA Results:
Analysis of Variance Table
Response: disease_severity
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 2 248.13 124.07 15.51 0.0006 ***
Residuals 12 96.00 8.00
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Questions (3 minutes, think-pair-share):
aov() → ANOVA table with anova()
You now know how to design, randomize, and analyze a completely randomized experiment!
Friday (Lecture 4): CRD Assumptions & Diagnostics
Tip
Homework suggestion: Re-run today’s R code with different random seeds. See how randomization affects results (but conclusions remain similar).
Textbook:
R Packages:
FielDHub: Generate agricultural experimental designs (link)car: Type III ANOVA for unbalanced designsemmeans: Estimated marginal means and post-hoc testsggplot2: Data visualizationAdditional Reading:
AGST 50104: Experimental Design | Spring 2026