Completely Randomized Design (CRD): ANOVA Basics

Analysis of Variance Fundamentals

Dr. Samuel B Fernandes

2026-01-01

Learning Objectives

By the end of this lecture, you should be able to:

  • Explain when and why to use a completely randomized design for experiments
  • Design a CRD
  • Describe the statistical model for CRD: \(y_{ij} = \mu + \tau_i + \varepsilon_{ij}\)
  • Construct and interpret an ANOVA table for a one-way CRD

Review: Why Randomization?

  • Monday’s lecture: Randomization eliminates bias and ensures valid statistical inference
  • Wednesday’s lecture: True (biological) replication vs Pseudo (technical) replication (i.e., experimental vs observational unit)
  • Today’s question: Once we randomize and collect data, how do we analyze it?

Randomization is the foundation of experimental design, but ANOVA is the tool for analysis.

What is a CRD?

CRD: When and Why

Completely Randomized Design (CRD):

  • Simplest experimental design
  • Treatments assigned completely at random to all experimental units
  • Assumption: All units are homogeneous (no known gradients or blocks)
  • Best for: Controlled environments (growth chambers, labs, homogeneous fields)

When NOT to use CRD:

  • Field trials with soil gradients
  • Greenhouse trials with bench effects
  • Any situation with known nuisance variation → Use blocking instead (next week!)

Randomization snapshot

CRD Randomization

All 12 plots receive treatments purely by chance

Agricultural Example: Nitrogen on Wheat

Research Question:
Do different nitrogen fertilizer rates affect wheat yield?

Experimental Setup:

  • Treatments: 4 nitrogen rates (0, 50, 100, 150 kg N/ha)
  • Experimental Units: 24 field plots (assumed homogeneous)
  • Replicates: [____]plots per treatment
  • Response Variable: Grain yield (kg/ha)
  • Design: Completely Randomized Design (CRD)

Null Hypothesis:
\(H_0: \tau_1 = \tau_2 = \tau_3 = \tau_4 = 0\) (No treatment effects; all treatments have the same mean yield)

Alternative Hypothesis:
\(H_a:\) At least one \(\tau_i \neq 0\) (At least one treatment differs)

CRD Layout and Randomization

Generating the Randomized Layout

Code
library(dplyr)
library(ggplot2)
library(FielDHub)

Using FielDHub package:

Code
# Generate CRD layout
crd_design <- CRD(
    t = 4, # Number of treatments
    reps = 6, # Replicates per treatment
    plotNumber = 1, # Starting plot number
    seed = 2026, # Reproducibility
    locationName = "Fayetteville",
    data = data.frame(
        Treatment = c("N0", "N50", "N100", "N150"),
        Reps = 6
    )
)

# Extract field book
field_book <- crd_design$fieldBook
head(field_book, 10)
#>    ID     LOCATION PLOT REP TREATMENT
#> 1   1 Fayetteville    1   1        N0
#> 2   2 Fayetteville    2   6       N50
#> 3   3 Fayetteville    3   2      N100
#> 4   4 Fayetteville    4   6        N0
#> 5   5 Fayetteville    5   2       N50
#> 6   6 Fayetteville    6   2        N0
#> 7   7 Fayetteville    7   5      N150
#> 8   8 Fayetteville    8   1      N100
#> 9   9 Fayetteville    9   6      N100
#> 10 10 Fayetteville   10   5       N50

Using the FielDHub Shiny app (no code needed)

  • Launch the app in RStudio or VScode: it opens a browser
  • Choose Completely Randomized Design (CRD), set Treatments = 4, Replicates = 6, Start plot = 1, and Seed = 2026.
  • Click Run randomization, then check the Field Book tab for assignments and the Layout tab for the grid view.
  • Export your layout or field book with the Download buttons to share with your team.
Code
# Launch the interactive FielDHub Shiny app (run locally in RStudio)
FielDHub::run_app() #or
library(FieldHub)
run_app()

Visualizing the CRD Layout

Code
# Add grid coordinates for visualization
field_book <- field_book |>
    mutate(
        row = rep(1:4, each = 6),
        col = rep(1:6, times = 4)
    )

ggplot(field_book, aes(x = col, y = row, fill = TREATMENT)) +
    geom_tile(color = "black", linewidth = 1) +
    geom_text(aes(label = PLOT), color = "white", size = 5, fontface = "bold") +
    scale_fill_manual(values = c(
        "N0" = "#9D2235",
        "N50" = "#FF8C42",
        "N100" = "#2E8B57",
        "N150" = "#4A90E2"
    )) +
    labs(
        title = "CRD Field Layout: Treatments Randomly Assigned to Plots",
        x = "Column",
        y = "Row",
        fill = "Treatment"
    ) +
    theme_void(base_size = 14) +
    theme(
        legend.position = "bottom",
        plot.title = element_text(hjust = 0.5, face = "bold")
    ) +
    coord_equal()

Figure 1: Completely Randomized Design: 4 nitrogen treatments × 6 replicates

Statistical Model for CRD

The CRD Model

In plain English:

  • Observed yield = grand mean + treatment effect + random noise
  • Treatment effect \(\tau_i\) could be positive (increases yield) or negative (decreases yield)
  • Error \(\varepsilon_{ij}\) captures plot-to-plot variation (weather, soil micro-variation, measurement error)

CRD Model: \[y_{ij} = \mu + \tau_i + \varepsilon_{ij}\]

Where:

  • \(y_{ij}\) = Observed response for treatment \(i\), replicate \(j\)
  • \(\mu\) = Overall mean (grand mean)
  • \(\tau_i\) = Effect of treatment \(i\) (what we want to estimate)
  • \(\varepsilon_{ij}\) = Random error term \(\sim N(0, \sigma^2)\)

Model Assumptions

  1. Independence: Observations are independent (randomization ensures this)
  2. Normality: Errors \(\varepsilon_{ij}\) are normally distributed
  3. Homogeneity of Variance: Errors have constant variance \(\sigma^2\) across all treatments

Why these assumptions matter:

  • Normality: Ensures F-test is valid (though ANOVA is robust to mild violations with balanced designs)
  • Equal variance: Prevents treatment comparisons from being biased by different noise levels
  • Independence: Broken if plots influence each other (e.g., pest movement, water runoff)

Important: We’ll test these assumptions next lecture with diagnostic plots. For now, understand what ANOVA requires.

ANOVA: Analysis of Variance

The Core Idea of ANOVA

ANOVA asks: Is the variation between treatments larger than the variation within treatments?

Total Variation = Treatment Variation + Error Variation

\[SS_{Total} = SS_{Treatment} + SS_{Error}\]

If treatments differ:

  • Between-group variation is LARGE
  • Within-group variation is small.
  • F-statistic is large → reject \(H_0\)

If treatments are the same:

  • Between-group variation is small
  • Comparable to within-group variation
  • F-statistic is small → fail to reject \(H_0\)

ANOVA Table Structure

Source df Sum of Squares Mean Square F-statistic p-value
Treatment \(t-1\) \(SS_{Trt}\) \(MS_{Trt} = \frac{SS_{Trt}}{t-1}\) \(F = \frac{MS_{Trt}}{MS_{Error}}\) P(F > observed)
Error \(t(r-1)\) \(SS_{Error}\) \(MS_{Error} = \frac{SS_{Error}}{t(r-1)}\)
Total \(tr-1\) \(SS_{Total}\)

Key formulas:

  • \(SS_{Total} = \sum_{i,j} (y_{ij} - \bar{y}_{..})^2\) (total deviation from grand mean)
  • \(SS_{Treatment} = r \sum_i (\bar{y}_{i.} - \bar{y}_{..})^2\) (deviation of treatment means from grand mean)
  • \(SS_{Error} = SS_{Total} - SS_{Treatment}\) (leftover variation)

Interpreting the F-test

F-statistic: \[F = \frac{MS_{Treatment}}{MS_{Error}} = \frac{\text{Variation between treatments}}{\text{Variation within treatments}}\]

  • Large F → treatments explain more variation than random error → likely significant
  • Small F → treatment variation is comparable to error → not significant
  • p-value: Probability of observing this F (or larger) if \(H_0\) is true

Decision rule:

  • If p-value < \(\alpha\) (usually 0.05): Reject \(H_0\) → At least one treatment differs
  • If p-value ≥ \(\alpha\): Fail to reject \(H_0\) → No evidence of treatment differences

Remember: A significant F-test tells you “some treatments differ” but NOT which ones. Post-hoc tests (next week) identify specific differences.

R Code Demo

Simulating CRD Data

Code
set.seed(2026)

# Simulate wheat yield data (kg/ha)
# True means: N0=2400, N50=2650, N100=2900, N150=2950
# Error SD = 150
crd_data <- field_book |>
    mutate(
        true_mean = case_when(
            TREATMENT == "N0" ~ 2400,
            TREATMENT == "N50" ~ 2650,
            TREATMENT == "N100" ~ 2900,
            TREATMENT == "N150" ~ 2950
        ),
        yield = true_mean + rnorm(n(), mean = 0, sd = 150)
    ) |>
    select(PLOT, TREATMENT, REP, yield)

head(crd_data, 8)
#>   PLOT TREATMENT REP    yield
#> 1    1        N0   1 2478.088
#> 2    2       N50   6 2488.046
#> 3    3      N100   2 2920.886
#> 4    4        N0   6 2387.288
#> 5    5       N50   2 2550.004
#> 6    6        N0   2 2022.587
#> 7    7      N150   5 2839.728
#> 8    8      N100   1 2746.982

Visualizing Treatment Means

Code
# Calculate means and SEs
treatment_summary <- crd_data |>
    group_by(TREATMENT) |>
    summarize(
        mean_yield = mean(yield),
        se_yield = sd(yield) / sqrt(n()),
        .groups = "drop"
    )

ggplot(treatment_summary, aes(x = TREATMENT, y = mean_yield)) +
    geom_col(fill = "#2E8B57", alpha = 0.8, color = "black") +
    geom_errorbar(
        aes(
            ymin = mean_yield - se_yield,
            ymax = mean_yield + se_yield
        ),
        width = 0.2, linewidth = 1
    ) +
    geom_text(aes(label = round(mean_yield, 0)),
        vjust = -0.5, size = 5, fontface = "bold"
    ) +
    labs(
        title = "Wheat Yield Response to Nitrogen Fertilizer",
        x = "Nitrogen Treatment (kg N/ha)",
        y = "Mean Yield (kg/ha)"
    ) +
    theme_minimal(base_size = 16) +
    theme(
        plot.title = element_text(hjust = 0.5, face = "bold")
    ) +
    ylim(0, 3200)

Figure 2: Mean wheat yield by nitrogen treatment with standard error bars

Fitting the ANOVA Model in R

Code
# Fit linear model
model_crd <- aov(yield ~ TREATMENT, data = crd_data)

# ANOVA table
anova(model_crd)
#> Analysis of Variance Table
#> 
#> Response: yield
#>           Df  Sum Sq Mean Sq F value    Pr(>F)    
#> TREATMENT  3 1637754  545918  25.173 5.392e-07 ***
#> Residuals 20  433733   21687                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation:

  • TREATMENT F(3, 20) = 34.64, p < 0.001: Highly significant
  • At least one nitrogen treatment differs significantly from others
  • Treatment explains substantial yield variation

ANOVA Table Breakdown

  • Next step: Determine which treatments differ (post-hoc tests in Lecture 5)

Key values:

  • R² = 0.84: Treatment explains 84% of yield variation (I’ll show how to get this)
  • Residual standard error = 141.7 kg/ha: Average deviation from treatment means
  • F-statistic: Ratio of treatment variance to error variance

Your Turn Activity

Interpret ANOVA Output

Scenario:
A colleague tests 3 fungicide treatments (Control, FungA, FungB) on tomato disease severity (% leaf area infected). They run a CRD with 5 replicates per treatment (15 plants total).

ANOVA Results:

Analysis of Variance Table

Response: disease_severity
           Df  Sum Sq Mean Sq F value    Pr(>F)    
Treatment   2  248.13  124.07  15.51 0.0006 ***
Residuals  12   96.00    8.00                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Questions (3 minutes, think-pair-share):

  1. Are the fungicide treatments significantly different? How do you know?
  2. What percentage of variation is explained by treatments? (Hint: \(R^2 = \frac{SS_{Trt}}{SS_{Total}}\))
  3. What should the researcher do next?

Summary & Key Takeaways

What We Learned Today

  • CRD is the simplest experimental design: treatments assigned completely at random
  • Best for homogeneous experimental units (no gradients)
  • CRD Model: \(y_{ij} = \mu + \tau_i + \varepsilon_{ij}\)
  • ANOVA partitions variation: Total = Treatment + Error
  • F-test asks: Is treatment variation larger than error variation?
  • Significant F-test → at least one treatment differs (but doesn’t say which!)
  • R workflow: Generate design with FielDHub → Fit model with aov() → ANOVA table with anova()

You now know how to design, randomize, and analyze a completely randomized experiment!

Next Lecture Preview

Friday (Lecture 4): CRD Assumptions & Diagnostics

  • How to check ANOVA assumptions (normality, equal variance, independence)
  • Diagnostic plots: Q-Q plots, residual plots
  • What to do when assumptions are violated (transformations)
  • Bring questions about today’s material!

Tip

Homework suggestion: Re-run today’s R code with different random seeds. See how randomization affects results (but conclusions remain similar).

Resources

Textbook:

  • Oehlert (2010), A First Course in Design and Analysis of Experiments, Chapter 2 (Part 1)
    • Section 2.1: One-way classification
    • Section 2.2: CRD model and assumptions
    • Section 2.3: ANOVA F-test

R Packages:

  • FielDHub: Generate agricultural experimental designs (link)
  • car: Type III ANOVA for unbalanced designs
  • emmeans: Estimated marginal means and post-hoc tests
  • ggplot2: Data visualization

Additional Reading:

  • Montgomery (2017), Design and Analysis of Experiments, Chapter 3
  • Doncaster & Davey (2007), Analysis of Variance and Covariance, Chapter 2