Lesson 9
R-squared
Big question
How much of the outcome's variation does the regression explain?
Lesson progress
Complete checkpoints as you learn
Learning objectives
- Explain r-squared in plain language.
- Use r-squared correctly in an interpretation.
- Connect the lesson idea to a formula, graph, Python result, or real example.
Simple explanation
R-squared measures the share of total variation in y explained by the fitted regression line. It ranges from 0 to 1 in a model with an intercept. Higher values mean the fitted values track the sample outcomes more closely.
Key terms
- R-squared
- The fraction of total variation in y explained by the regression model.
- SST
- Total sum of squares; total variation in y around its sample mean.
- SSR
- Residual sum of squares; variation left unexplained by the fitted line.
- Explained variation
- The part of y variation captured by the fitted values.
R-squared
With an intercept, total variation equals explained variation plus residual variation: SST = SSE + SSR.
Example
If R-squared equals 0.70, the model explains 70 percent of the sample variation in y and leaves 30 percent in the residuals.
Interactive visual
Explained versus residual variation
R-squared compares residual variation with total variation in the outcome.
Variation split
Explained share
94.9%
Residual share
5.1%
Formula in words
R-squared = 1 - residual variation / total variation
In this small sample, the fitted line leaves about 5.1 percent of wage variation in the residuals, so R-squared is about 0.949.
Live notebook
Run this lesson as a notebook
Open an editable notebook cell-by-cell, run Python in the browser, and download the `.ipynb` file for later.
Interactive activity
R-squared visualizer
Compare how much variation is explained by models with different R-squared values.
Higher R-squared means more sample variation in y is explained by x. It measures fit, not whether the relationship is causal.
R-squared visualizer
Compare how much variation is explained by models with different R-squared values.
Higher R-squared means more sample variation in y is explained by x. It measures fit, not whether the relationship is causal.
Try it yourself
Write one plain-English sentence explaining the main idea from this lesson.
Common mistakes
Check these before you move on.
R-squared measures fit. It does not say whether x causes y.
Quick quiz
What does R-squared measure?
Quick quiz
If residual variation falls while total variation stays fixed, what happens to R-squared?
Key takeaway
R-squared tells how tightly the fitted line summarizes the sample outcome, not whether the slope is causal.