Completely Randomized Design (CRD): ANOVA Basics

Analysis of Variance Fundamentals

Dr. Samuel B Fernandes

2026-01-01

Learning Objectives

By the end of this lecture, you should be able to:

Explain when and why to use a completely randomized design for experiments
Design a CRD
Describe the statistical model for CRD: \(y_{ij} = \mu + \tau_i + \varepsilon_{ij}\)
Construct and interpret an ANOVA table for a one-way CRD

Review: Why Randomization?

Monday’s lecture: Randomization eliminates bias and ensures valid statistical inference
Wednesday’s lecture: True (biological) replication vs Pseudo (technical) replication (i.e., experimental vs observational unit)
Today’s question: Once we randomize and collect data, how do we analyze it?

Randomization is the foundation of experimental design, but ANOVA is the tool for analysis.

What is a CRD?

CRD: When and Why

Completely Randomized Design (CRD):

Simplest experimental design
Treatments assigned completely at random to all experimental units
Assumption: All units are homogeneous (no known gradients or blocks)
Best for: Controlled environments (growth chambers, labs, homogeneous fields)

When NOT to use CRD:

Field trials with soil gradients
Greenhouse trials with bench effects
Any situation with known nuisance variation → Use blocking instead (next week!)

Randomization snapshot

All 12 plots receive treatments purely by chance

Agricultural Example: Nitrogen on Wheat

Research Question:
Do different nitrogen fertilizer rates affect wheat yield?

Experimental Setup:

Treatments: 4 nitrogen rates (0, 50, 100, 150 kg N/ha)
Experimental Units: 24 field plots (assumed homogeneous)
Replicates: [____]plots per treatment
Response Variable: Grain yield (kg/ha)
Design: Completely Randomized Design (CRD)

Null Hypothesis:
\(H_0: \tau_1 = \tau_2 = \tau_3 = \tau_4 = 0\) (No treatment effects; all treatments have the same mean yield)

Alternative Hypothesis:
\(H_a:\) At least one \(\tau_i \neq 0\) (At least one treatment differs)

CRD Layout and Randomization

Generating the Randomized Layout

Code

library(dplyr)
library(ggplot2)
library(FielDHub)

Using FielDHub package:

Code

# Generate CRD layout
crd_design <- CRD(
    t = 4, # Number of treatments
    reps = 6, # Replicates per treatment
    plotNumber = 1, # Starting plot number
    seed = 2026, # Reproducibility
    locationName = "Fayetteville",
    data = data.frame(
        Treatment = c("N0", "N50", "N100", "N150"),
        Reps = 6
    )
)

# Extract field book
field_book <- crd_design$fieldBook
head(field_book, 10)

#>    ID     LOCATION PLOT REP TREATMENT
#> 1   1 Fayetteville    1   1        N0
#> 2   2 Fayetteville    2   6       N50
#> 3   3 Fayetteville    3   2      N100
#> 4   4 Fayetteville    4   6        N0
#> 5   5 Fayetteville    5   2       N50
#> 6   6 Fayetteville    6   2        N0
#> 7   7 Fayetteville    7   5      N150
#> 8   8 Fayetteville    8   1      N100
#> 9   9 Fayetteville    9   6      N100
#> 10 10 Fayetteville   10   5       N50

Using the FielDHub Shiny app (no code needed)

Launch the app in RStudio or VScode: it opens a browser
Choose Completely Randomized Design (CRD), set Treatments = 4, Replicates = 6, Start plot = 1, and Seed = 2026.
Click Run randomization, then check the Field Book tab for assignments and the Layout tab for the grid view.
Export your layout or field book with the Download buttons to share with your team.

Code

# Launch the interactive FielDHub Shiny app (run locally in RStudio)
FielDHub::run_app() #or
library(FieldHub)
run_app()

Visualizing the CRD Layout

Code

# Add grid coordinates for visualization
field_book <- field_book |>
    mutate(
        row = rep(1:4, each = 6),
        col = rep(1:6, times = 4)
    )

ggplot(field_book, aes(x = col, y = row, fill = TREATMENT)) +
    geom_tile(color = "black", linewidth = 1) +
    geom_text(aes(label = PLOT), color = "white", size = 5, fontface = "bold") +
    scale_fill_manual(values = c(
        "N0" = "#9D2235",
        "N50" = "#FF8C42",
        "N100" = "#2E8B57",
        "N150" = "#4A90E2"
    )) +
    labs(
        title = "CRD Field Layout: Treatments Randomly Assigned to Plots",
        x = "Column",
        y = "Row",
        fill = "Treatment"
    ) +
    theme_void(base_size = 14) +
    theme(
        legend.position = "bottom",
        plot.title = element_text(hjust = 0.5, face = "bold")
    ) +
    coord_equal()

Figure 1: Completely Randomized Design: 4 nitrogen treatments × 6 replicates

Statistical Model for CRD

The CRD Model

In plain English:

Observed yield = grand mean + treatment effect + random noise
Treatment effect \(\tau_i\) could be positive (increases yield) or negative (decreases yield)
Error \(\varepsilon_{ij}\) captures plot-to-plot variation (weather, soil micro-variation, measurement error)

CRD Model: \[y_{ij} = \mu + \tau_i + \varepsilon_{ij}\]

Where:

\(y_{ij}\) = Observed response for treatment \(i\), replicate \(j\)
\(\mu\) = Overall mean (grand mean)
\(\tau_i\) = Effect of treatment \(i\) (what we want to estimate)
\(\varepsilon_{ij}\) = Random error term \(\sim N(0, \sigma^2)\)

Model Assumptions

Independence: Observations are independent (randomization ensures this)
Normality: Errors \(\varepsilon_{ij}\) are normally distributed
Homogeneity of Variance: Errors have constant variance \(\sigma^2\) across all treatments

Why these assumptions matter:

Normality: Ensures F-test is valid (though ANOVA is robust to mild violations with balanced designs)
Equal variance: Prevents treatment comparisons from being biased by different noise levels
Independence: Broken if plots influence each other (e.g., pest movement, water runoff)

Important: We’ll test these assumptions next lecture with diagnostic plots. For now, understand what ANOVA requires.

ANOVA: Analysis of Variance

The Core Idea of ANOVA

ANOVA asks: Is the variation between treatments larger than the variation within treatments?

Total Variation = Treatment Variation + Error Variation

\[SS_{Total} = SS_{Treatment} + SS_{Error}\]

If treatments differ:

Between-group variation is LARGE
Within-group variation is small.
F-statistic is large → reject \(H_0\)

If treatments are the same:

Between-group variation is small
Comparable to within-group variation
F-statistic is small → fail to reject \(H_0\)

ANOVA Table Structure

Source	df	Sum of Squares	Mean Square	F-statistic	p-value
Treatment	\(t-1\)	\(SS_{Trt}\)	\(MS_{Trt} = \frac{SS_{Trt}}{t-1}\)	\(F = \frac{MS_{Trt}}{MS_{Error}}\)	P(F > observed)
Error	\(t(r-1)\)	\(SS_{Error}\)	\(MS_{Error} = \frac{SS_{Error}}{t(r-1)}\)	—	—
Total	\(tr-1\)	\(SS_{Total}\)	—	—	—

Key formulas:

\(SS_{Total} = \sum_{i,j} (y_{ij} - \bar{y}_{..})^2\) (total deviation from grand mean)
\(SS_{Treatment} = r \sum_i (\bar{y}_{i.} - \bar{y}_{..})^2\) (deviation of treatment means from grand mean)
\(SS_{Error} = SS_{Total} - SS_{Treatment}\) (leftover variation)

Interpreting the F-test

F-statistic: \[F = \frac{MS_{Treatment}}{MS_{Error}} = \frac{\text{Variation between treatments}}{\text{Variation within treatments}}\]

Large F → treatments explain more variation than random error → likely significant
Small F → treatment variation is comparable to error → not significant
p-value: Probability of observing this F (or larger) if \(H_0\) is true

Decision rule:

If p-value < \(\alpha\) (usually 0.05): Reject \(H_0\) → At least one treatment differs
If p-value ≥ \(\alpha\): Fail to reject \(H_0\) → No evidence of treatment differences

Remember: A significant F-test tells you “some treatments differ” but NOT which ones. Post-hoc tests (next week) identify specific differences.

R Code Demo

Simulating CRD Data

Code

set.seed(2026)

# Simulate wheat yield data (kg/ha)
# True means: N0=2400, N50=2650, N100=2900, N150=2950
# Error SD = 150
crd_data <- field_book |>
    mutate(
        true_mean = case_when(
            TREATMENT == "N0" ~ 2400,
            TREATMENT == "N50" ~ 2650,
            TREATMENT == "N100" ~ 2900,
            TREATMENT == "N150" ~ 2950
        ),
        yield = true_mean + rnorm(n(), mean = 0, sd = 150)
    ) |>
    select(PLOT, TREATMENT, REP, yield)

head(crd_data, 8)

#>   PLOT TREATMENT REP    yield
#> 1    1        N0   1 2478.088
#> 2    2       N50   6 2488.046
#> 3    3      N100   2 2920.886
#> 4    4        N0   6 2387.288
#> 5    5       N50   2 2550.004
#> 6    6        N0   2 2022.587
#> 7    7      N150   5 2839.728
#> 8    8      N100   1 2746.982

Visualizing Treatment Means

Code

# Calculate means and SEs
treatment_summary <- crd_data |>
    group_by(TREATMENT) |>
    summarize(
        mean_yield = mean(yield),
        se_yield = sd(yield) / sqrt(n()),
        .groups = "drop"
    )

ggplot(treatment_summary, aes(x = TREATMENT, y = mean_yield)) +
    geom_col(fill = "#2E8B57", alpha = 0.8, color = "black") +
    geom_errorbar(
        aes(
            ymin = mean_yield - se_yield,
            ymax = mean_yield + se_yield
        ),
        width = 0.2, linewidth = 1
    ) +
    geom_text(aes(label = round(mean_yield, 0)),
        vjust = -0.5, size = 5, fontface = "bold"
    ) +
    labs(
        title = "Wheat Yield Response to Nitrogen Fertilizer",
        x = "Nitrogen Treatment (kg N/ha)",
        y = "Mean Yield (kg/ha)"
    ) +
    theme_minimal(base_size = 16) +
    theme(
        plot.title = element_text(hjust = 0.5, face = "bold")
    ) +
    ylim(0, 3200)

Figure 2: Mean wheat yield by nitrogen treatment with standard error bars

Fitting the ANOVA Model in R

Code

# Fit linear model
model_crd <- aov(yield ~ TREATMENT, data = crd_data)

# ANOVA table
anova(model_crd)

#> Analysis of Variance Table
#> 
#> Response: yield
#>           Df  Sum Sq Mean Sq F value    Pr(>F)    
#> TREATMENT  3 1637754  545918  25.173 5.392e-07 ***
#> Residuals 20  433733   21687                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation:

TREATMENT F(3, 20) = 34.64, p < 0.001: Highly significant
At least one nitrogen treatment differs significantly from others
Treatment explains substantial yield variation

ANOVA Table Breakdown

Next step: Determine which treatments differ (post-hoc tests in Lecture 5)

Key values:

R² = 0.84: Treatment explains 84% of yield variation (I’ll show how to get this)
Residual standard error = 141.7 kg/ha: Average deviation from treatment means
F-statistic: Ratio of treatment variance to error variance

Your Turn Activity

Interpret ANOVA Output

Scenario:
A colleague tests 3 fungicide treatments (Control, FungA, FungB) on tomato disease severity (% leaf area infected). They run a CRD with 5 replicates per treatment (15 plants total).

ANOVA Results:

Analysis of Variance Table

Response: disease_severity
           Df  Sum Sq Mean Sq F value    Pr(>F)    
Treatment   2  248.13  124.07  15.51 0.0006 ***
Residuals  12   96.00    8.00                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Questions (3 minutes, think-pair-share):

Are the fungicide treatments significantly different? How do you know?
What percentage of variation is explained by treatments? (Hint: \(R^2 = \frac{SS_{Trt}}{SS_{Total}}\))
What should the researcher do next?

Summary & Key Takeaways

What We Learned Today

CRD is the simplest experimental design: treatments assigned completely at random
Best for homogeneous experimental units (no gradients)
CRD Model: \(y_{ij} = \mu + \tau_i + \varepsilon_{ij}\)
ANOVA partitions variation: Total = Treatment + Error
F-test asks: Is treatment variation larger than error variation?
Significant F-test → at least one treatment differs (but doesn’t say which!)
R workflow: Generate design with FielDHub → Fit model with aov() → ANOVA table with anova()

You now know how to design, randomize, and analyze a completely randomized experiment!

Next Lecture Preview

Friday (Lecture 4): CRD Assumptions & Diagnostics

How to check ANOVA assumptions (normality, equal variance, independence)
Diagnostic plots: Q-Q plots, residual plots
What to do when assumptions are violated (transformations)
Bring questions about today’s material!

Tip

Homework suggestion: Re-run today’s R code with different random seeds. See how randomization affects results (but conclusions remain similar).

Resources

Textbook:

Oehlert (2010), A First Course in Design and Analysis of Experiments, Chapter 2 (Part 1)
- Section 2.1: One-way classification
- Section 2.2: CRD model and assumptions
- Section 2.3: ANOVA F-test

R Packages:

FielDHub: Generate agricultural experimental designs (link)
car: Type III ANOVA for unbalanced designs
emmeans: Estimated marginal means and post-hoc tests
ggplot2: Data visualization

Additional Reading:

Montgomery (2017), Design and Analysis of Experiments, Chapter 3
Doncaster & Davey (2007), Analysis of Variance and Covariance, Chapter 2