Lesson 17
Descriptive statistics in Python
Big question
How can Python quickly summarize a dataset?
Lesson progress
Complete checkpoints as you learn
Learning objectives
- Explain descriptive statistics in python in plain language.
- Use describe correctly in an interpretation.
- Connect the lesson idea to a formula, graph, Python result, or real example.
Simple explanation
Descriptive statistics help you check typical values, spread, and possible surprises. In Python, pandas can calculate these summaries for many variables at once.
Key terms
- Describe
- A pandas method that reports common summary statistics.
- Minimum
- The smallest value.
- Maximum
- The largest value.
- Count
- The number of non-missing observations.
Example
Before studying wages, check whether wages and education have reasonable minimums, maximums, and averages.
Summarize wage data
1import pandas as pd2 3df = pd.read_csv("wage_sample.csv")4print(df[["wage", "education", "experience"]].describe())Live notebook
Run this lesson as a notebook
Open an editable notebook cell-by-cell, run Python in the browser, and download the `.ipynb` file for later.
Checkpoint activity
Pause and explain this lesson's main idea in your own words before moving forward.
Try it yourself
Write one plain-English sentence explaining the main idea from this lesson.
Common mistakes
Check these before you move on.
A regression coefficient describes a pattern unless the assumptions or research design support a causal interpretation.
Quick quiz
Why should we summarize data before modeling?
Key takeaway
Descriptive statistics are a first quality check and a first story about the data.