Ceteris LabInteractive Econometrics

Lesson 10

Regression in Python

Big question

How do we estimate a simple regression with pandas and statsmodels?

Lesson progress

Complete checkpoints as you learn

0% complete0 checkpoint streak
Big question
Concept
Activity
Quiz

Learning objectives

  • Explain regression in python in plain language.
  • Use pandas correctly in an interpretation.
  • Connect the lesson idea to a formula, graph, Python result, or real example.

Simple explanation

Python lets students load data, inspect variables, estimate a regression, and keep the workflow reproducible. The core pattern is load the CSV, define y and x, add an intercept, fit the model, and print the summary.

Key terms

pandas
A Python library for loading and working with tabular data.
statsmodels
A Python library for estimating statistical models and printing regression output.
add_constant
A statsmodels helper that adds an intercept column to the explanatory variables.
summary output
A formatted table reporting regression estimates and diagnostics.

Estimated sample regression function

y^=β^0+β^1x\hat{y} = \hat{\beta}_0 + \hat{\beta}_1x

Example

The local CSV wage_sample.csv is small enough for students to read but still shows the full regression workflow.

Interactive visual

Python workflow

Load, inspect, estimate, explain, and save the result.

wage_sample.csv

Load

Read wage_sample.csv into a pandas DataFrame.

Inspect

Check columns, ranges, and a scatter plot before modeling.

Estimate

Use statsmodels OLS with a constant column.

Explain

Translate coefficients, R-squared, and limitations.

Estimate wage on education

1import pandas as pd2import statsmodels.api as sm3 4df = pd.read_csv("wage_sample.csv")5 6y = df["wage"]7X = sm.add_constant(df["education"])8 9model = sm.OLS(y, X).fit()10print(model.summary())11 12print("Intercept:", round(model.params["const"], 2))13print("Education slope:", round(model.params["education"], 2))14print("R-squared:", round(model.rsquared, 3))

Python walkthrough

  1. 1pandas reads the CSV into a DataFrame, where columns behave like named variables.
  2. 2y stores the dependent variable and X stores the explanatory variable plus a constant for the intercept.
  3. 3sm.OLS(y, X).fit() estimates the line that minimizes the sum of squared residuals.
  4. 4The printed values are rounded so students can immediately practice interpretation.

Live notebook

Run this lesson as a notebook

Open an editable notebook cell-by-cell, run Python in the browser, and download the `.ipynb` file for later.

Interactive activity

Code prediction

X = sm.add_constant(X)

What does this line add?

Try it yourself

Change the independent variable from education to experience and rerun the regression.

Dataset
Optional Python practice included

Common mistakes

Check these before you move on.

A regression coefficient describes a pattern unless the assumptions or research design support a causal interpretation.

Quick quiz

Why does the Python code use sm.add_constant?

Quick quiz

What does model.params['education'] return?

Key takeaway

A good Python regression workflow is short, reproducible, and easy to translate into plain language.