Lesson 6
Covariance and correlation
Big question
How do we describe whether two variables move together?
Lesson progress
Complete checkpoints as you learn
Learning objectives
- Explain covariance and correlation in plain language.
- Use covariance correctly in an interpretation.
- Connect the lesson idea to a formula, graph, Python result, or real example.
Simple explanation
Covariance and correlation summarize co-movement. Correlation is easier to read because it is scaled between -1 and 1. A positive correlation means two variables tend to move in the same direction.
Key terms
- Covariance
- A measure of whether two variables move above or below their averages together.
- Correlation
- A standardized measure of linear association between -1 and 1.
- Positive association
- Higher values of one variable tend to come with higher values of another.
- Negative association
- Higher values of one variable tend to come with lower values of another.
Correlation
Example
If people with more education often have higher wages, education and wage may have a positive correlation.
Correlation in pandas
1import pandas as pd2 3df = pd.DataFrame({4 "wage": [18, 22, 30, 35],5 "education": [12, 14, 16, 18]6})7 8print(df["wage"].corr(df["education"]))Checkpoint activity
Pause and explain this lesson's main idea in your own words before moving forward.
Try it yourself
Write one plain-English sentence explaining the main idea from this lesson.
Common mistakes
Check these before you move on.
A regression coefficient describes a pattern unless the assumptions or research design support a causal interpretation.
Quick quiz
What is the largest possible correlation?
Key takeaway
Correlation is useful for describing patterns, but it does not by itself prove cause and effect.