Ceteris LabInteractive Econometrics

Lesson 9

R-squared

Big question

How much of the outcome's variation does the regression explain?

Lesson progress

Complete checkpoints as you learn

0% complete0 checkpoint streak
Big question
Concept
Activity
Quiz

Learning objectives

  • Explain r-squared in plain language.
  • Use r-squared correctly in an interpretation.
  • Connect the lesson idea to a formula, graph, Python result, or real example.

Simple explanation

R-squared measures the share of total variation in y explained by the fitted regression line. It ranges from 0 to 1 in a model with an intercept. Higher values mean the fitted values track the sample outcomes more closely.

Key terms

R-squared
The fraction of total variation in y explained by the regression model.
SST
Total sum of squares; total variation in y around its sample mean.
SSR
Residual sum of squares; variation left unexplained by the fitted line.
Explained variation
The part of y variation captured by the fitted values.

R-squared

R2=1SSRSSTR^2 = 1 - \frac{SSR}{SST}

With an intercept, total variation equals explained variation plus residual variation: SST = SSE + SSR.

Example

If R-squared equals 0.70, the model explains 70 percent of the sample variation in y and leaves 30 percent in the residuals.

Interactive visual

Explained versus residual variation

R-squared compares residual variation with total variation in the outcome.

wage_sample.csv

Variation split

Explained share

94.9%

Residual share

5.1%

Formula in words

R-squared = 1 - residual variation / total variation

In this small sample, the fitted line leaves about 5.1 percent of wage variation in the residuals, so R-squared is about 0.949.

Live notebook

Run this lesson as a notebook

Open an editable notebook cell-by-cell, run Python in the browser, and download the `.ipynb` file for later.

Interactive activity

R-squared visualizer

Compare how much variation is explained by models with different R-squared values.

Explained 55%
Unexplained 45%

Higher R-squared means more sample variation in y is explained by x. It measures fit, not whether the relationship is causal.

R-squared visualizer

Compare how much variation is explained by models with different R-squared values.

Explained 55%
Unexplained 45%

Higher R-squared means more sample variation in y is explained by x. It measures fit, not whether the relationship is causal.

Try it yourself

Write one plain-English sentence explaining the main idea from this lesson.

Common mistakes

Check these before you move on.

R-squared measures fit. It does not say whether x causes y.

Quick quiz

What does R-squared measure?

Quick quiz

If residual variation falls while total variation stays fixed, what happens to R-squared?

Key takeaway

R-squared tells how tightly the fitted line summarizes the sample outcome, not whether the slope is causal.