Lesson 15
Working with pandas DataFrames
Big question
How does Python store a dataset like a spreadsheet?
Lesson progress
Complete checkpoints as you learn
Learning objectives
- Explain working with pandas dataframes in plain language.
- Use dataframe correctly in an interpretation.
- Connect the lesson idea to a formula, graph, Python result, or real example.
Simple explanation
A pandas DataFrame is a table with rows and columns. Each row is an observation and each column is a variable, which matches the way econometric datasets are usually organized.
Key terms
- DataFrame
- A table-like data object from pandas.
- Column
- A variable in the dataset.
- Row
- One observation in the dataset.
- pandas
- A Python package for data tables and analysis.
Example
A wage dataset might have one row per worker and columns for wage, education, experience, and gender.
| wage | education | experience | female | married |
|---|---|---|---|---|
| 18.5 | 12 | 3 | No | No |
| 24.2 | 16 | 6 | Yes | Yes |
| 31.8 | 18 | 10 | No | Yes |
| 21.1 | 14 | 4 | Yes | No |
Create a DataFrame
1import pandas as pd2 3df = pd.DataFrame({4 "wage": [18.5, 24.2, 31.8],5 "education": [12, 16, 18],6 "experience": [3, 6, 10]7})8 9print(df)Checkpoint activity
Pause and explain this lesson's main idea in your own words before moving forward.
Try it yourself
Write one plain-English sentence explaining the main idea from this lesson.
Common mistakes
Check these before you move on.
A regression coefficient describes a pattern unless the assumptions or research design support a causal interpretation.
Quick quiz
In a DataFrame, what does a column usually represent?
Key takeaway
DataFrames make datasets feel like spreadsheets but with repeatable code.