AGST 50104: Experimental Design
2026-01-01
By the end of this lecture, you should be able to:
Reproducible Research = Transparent Science
When you publish, reviewers and readers can verify your work. When you graduate, your advisor can extend your research.
Key Benefits:
For This Course:
Tip
Alternative: RStudio also works well (see supplemental guide)
Install These Three (Required):
.qmd filesHow to Install: Extensions icon (Ctrl/Cmd+Shift+X) → Search → Install
Five Main Areas:
Workspaces in VS Code are like R Projects in RStudio—they keep all your files organized and settings consistent.
Create your course workspace:
doe_course
Recommended folder structure:
doe_course/
├── data/ # Your datasets
├── scripts/ # R scripts
├── analyses/ # Quarto documents
├── figures/ # Output plots
└── README.md # Project notes
GitHub provides:
What is GitHub Copilot?
Code Generation:
Code Explanation:
Statistical Tasks:
Learning Aid:
Mode 1: Inline Suggestions (as you type)
Press Tab to accept | Esc to reject | Alt/Option+] for next suggestion
Mode 2: Copilot Chat (conversations)
When writing code:
When learning:
Good prompts = Clear context + Specific request + Expected format
Formula:
Context: What data/situation you have
Task: What you want to accomplish
Constraints: Any requirements or preferences
Format: How you want the output
Example:
Bad prompt: > “plot the data”
❌ What data?
❌ What kind of plot?
❌ What variables?
Good prompt: > “Create a ggplot boxplot of wheat yield (kg/ha) by nitrogen treatment. The data frame is called trial_data with columns treatment and yield. Use a minimal theme and label axes clearly.”
✅ Specific tool (ggplot)
✅ Plot type (boxplot)
✅ Variables identified
✅ Data structure given
✅ Style preferences
Strategy 1: Start Simple, Then Refine
Why this works: Each step builds on the last. Easier to debug.
Strategy 2: Provide Example Data Structure
Why this works: Copilot sees your actual variable names and structure.
❌ Bad Prompts:
✅ Good Prompts:
✅ Good Uses:
❌ Don’t Use For:
Caution
Rule: Always review & test AI code!
Tip 1: Give context with comments
Copilot uses ALL nearby code as context.
Tip 2: Use Copilot Chat for explanations
When you see unfamiliar code, highlight it → Right-click → “Copilot: Explain This”
Tip 3: Ask for alternatives
“Show me 3 different ways to create a randomized block design in R”
Tip 4: Request code with comments
“Generate a CRD randomization and add comments explaining each step”
Tip 5: Verify statistical appropriateness
“Is a one-way ANOVA appropriate for this experimental design? Explain why or why not.”
| Task | Best Tool | Why | Cost |
|---|---|---|---|
| Inline code suggestions | GitHub Copilot | Native VS Code integration; learns your style | Free (students) |
| Statistical explanations | ChatGPT (GPT-4) | Best at explaining concepts in plain English | Free tier or $20/mo |
| R-specific help | Claude 3 | Strong with data science; long context window | Free tier or $20/mo |
| Complex reasoning | Claude 3 Opus | Best for experimental design decisions | $20/mo (optional) |
| Quick debugging | Copilot Chat | Instant, in-editor; sees your full code | Free (students) |
| Learning statistics | ChatGPT/Claude | Conversational; good at analogies | Free tiers available |
| Code review | Claude | Detailed feedback; catches subtle issues | Free tier available |
| Task | Best Tool | Why | Cost |
|---|---|---|---|
| Data visualization | Gamma AI | Convert raw data/tables into beautiful interactive dashboards | Free tier or $12/mo |
| Visual design | Napkin | Transform text/data into diagrams, charts, infographics | Free tier available |
| Data analysis reports | NotebookLM | AI notebook that synthesizes & organizes information | Free (Google account) |
| Statistical plots | Plotly Copilot | Smart suggestions for best chart types | Free tier or paid |
| Latex equations | ChatGPT | Excellent at math formatting & derivations | Free tier available |
| Presentation design | Tome | AI-powered storytelling with slides & visuals | Free tier or $10/mo |
| Data exploration | ChatGPT with Code Interpreter | Execute & visualize code output inline | $20/mo (GPT-4) |
GitHub Copilot
✅ Fast inline suggestions
✅ Integrated in VS Code
✅ Learns from your code
✅ Free for students
❌ Can’t explain concepts deeply
❌ Occasionally wrong with stats
ChatGPT (OpenAI)
✅ Best for learning concepts
✅ Great explanations
✅ Conversational
✅ Free tier (GPT-3.5)
❌ Sometimes verbose
❌ GPT-5 requires subscription
Claude (Anthropic)
✅ Strong with R/Python
✅ Very accurate with statistics
✅ Long conversations (memory)
✅ Free tier available
❌ Slower than Copilot
❌ Not integrated in editor
Gemini (Google)
✅ Fast responses
✅ Free and unlimited
✅ Integrated with Google products
❌ Less accurate for statistics
❌ Weaker at R coding
Task: Create a completely randomized design
What you type:
Copilot suggests:
Your job: Review, test, understand. Press Tab to accept!
Task: Visualize the experimental design
What you type:
Copilot + Your refinement:
# Add coordinates
crd_design <- crd_design |>
mutate(
row = rep(1:4, each = 6),
col = rep(1:6, times = 4)
)
# Create visualization
ggplot(crd_design, aes(x = col, y = row, fill = treatment)) +
geom_tile(color = "white", linewidth = 2) +
geom_text(aes(label = plot), color = "white", size = 5, fontface = "bold") +
scale_fill_manual(values = c("Control" = "#9D2235",
"Low" = "#FF8C42",
"Medium" = "#2E8B57",
"High" = "#4A90E2")) +
labs(title = "Completely Randomized Design Layout") +
theme_void(base_size = 14) +
coord_equal()Figure 1: CRD field layout
Question to ChatGPT:
“I’m designing a fertilizer experiment with 4 nitrogen rates. I have 24 plots in a field that has a known fertility gradient from north to south. Should I use a completely randomized design or a randomized complete block design? Explain the trade-offs.”
ChatGPT response (summary):
“Use a Randomized Complete Block Design (RCBD) for this scenario:
- Why: The fertility gradient is a known source of variation. Blocking by location (e.g., 6 blocks running east-west) removes this variation from experimental error.
- Trade-off: Slightly more complex analysis, but much better precision.
- CRD would: Ignore the gradient, inflating error and reducing power to detect treatment differences. Recommendation: Create 6 blocks (north to south), each containing all 4 treatments randomly assigned.”
Your takeaway: AI helps you think through experimental design decisions!
sample() and set.seed()
Software & Setup:
Learning Materials:
AI Tools:
Getting Help:
AGST 50104: Experimental Design | Spring 2026