**Correlation** is a statistical technique that examines and measures the relationship between two variables systematically. It helps determine whether changes in one variable are associated with changes in another variable, and the direction and intensity of such relationship.
The term "correlation" comes from the concept of **covariation** — the tendency of two variables to vary together. Key questions that correlation analysis addresses:
**Important Distinction**: Correlation measures **covariation, NOT causation**. This is the most critical concept in correlation analysis. Just because two variables are correlated does not mean one causes the other. For example, ice-cream sales and drowning deaths both increase with temperature, but ice-cream consumption does not cause drowning. Temperature is the underlying cause affecting both variables.
Relationships between variables can be classified into three categories:
These relationships have logical economic or physical explanations. Example: Agricultural productivity and rainfall — low rainfall causes low productivity. The relationship between quantity demanded and price of a commodity (demand curve) has clear theoretical justification.
Some relationships exist but cannot be meaningfully explained causally. Example: The relationship between arrival of migratory birds in a sanctuary and local birth rates shows no causal connection; it is pure coincidence. Similarly, shoe size and money in your pocket may be correlated, but the relationship has no real meaning.
A third variable's impact on two variables creates a false apparent relationship between them. **Example**: Brisk ice-cream sales correlate positively with deaths due to drowning. However, rising temperature causes BOTH — more ice-cream consumption AND more people swimming (leading to more drowning deaths). Temperature is the true causal variable.
This distinction is crucial for **policy analysis** — identifying spurious correlations prevents governments from making incorrect policy decisions based on misleading relationships.
**Definition**: Positive correlation occurs when two variables move together in the same direction. When one variable increases, the other also increases; when one decreases, the other also decreases.
**Examples**:
**Economic Interpretation**: In positive correlation, both variables respond to similar economic conditions or one variable facilitates growth in the other.
**Definition**: Negative correlation occurs when two variables move in opposite directions. When one variable increases, the other decreases, and vice versa.
**Examples**:
**Economic Interpretation**: Negative correlation reflects inverse economic relationships fundamental to demand-supply theory and cost-benefit analysis.
When variables show no consistent relationship pattern — changes in one variable are not associated with predictable changes in the other.
Three main tools measure correlation: **scatter diagrams** (visual), **Karl Pearson's coefficient** (for numerical data), and **Spearman's rank correlation** (for ranked data).
A **scatter diagram** is a graphical technique that plots values of two variables as points on graph paper to visually examine the form of relationship without calculating numerical values.
Plot the values of variable X on the horizontal axis and variable Y on the vertical axis. Each pair of observations (X, Y) becomes a point on the graph.
**Interpretation based on point distribution**:
**Measuring intensity from scatter diagrams**:
**Advantage**: Quick visual understanding without complex calculations.
**Limitation**: Does not provide precise numerical measure; subjective interpretation possible.
**Also known as**: Product-moment correlation coefficient or simple correlation coefficient.
**Definition**: Karl Pearson's coefficient provides a precise numerical measure of the degree of linear relationship between two variables X and Y.
When two variables have a **linear relationship**, their association can be represented by a straight line on a graph. Karl Pearson's coefficient measures both direction (positive/negative) and strength (magnitude) of this linear association.
**Arithmetic Mean**:
**Variance** (measure of spread):
**Standard Deviation** (square root of variance):
**Covariance** (measure of joint movement):
where x = X - X̄ and y = Y - Ȳ are deviations from respective means.
**Formula 1** (using covariance and standard deviations):
r = [Σ(xy) / N] / (σₓ × σᵧ)
**Formula 2** (deviation form):
r = Σ(X - X̄)(Y - Ȳ) / √[Σ(X - X̄)² × Σ(Y - Ȳ)²]
**Formula 3** (raw score form):
r = [ΣXY - (ΣX × ΣY) / N] / √[{ΣX² - (ΣX)² / N} × {ΣY² - (ΣY)² / N}]
**Formula 4** (alternative raw score form):
r = [N × ΣXY - (ΣX × ΣY)] / √[{N × ΣX² - (ΣX)²} × {N × ΣY² - (ΣY)²}]
All four formulas yield identical results; choice depends on data form and computational convenience.
1. **Unitless Measure**: r has no units of measurement. The correlation between height in feet and weight in kilograms is a pure number (e.g., 0.7), independent of measurement units.
2. **Range**: The value of r always lies between –1 and +1, inclusive:
3. **Sign Indicates Direction**:
4. **Magnitude Indicates Strength**:
5. **Perfect Correlation**:
6. **Independence from Origin and Scale Change** (Most Important Property):
If U = (X – A) / B and V = (Y – C) / D, where A and C are assumed means, B and D are common factors of same sign, then:
**rᵤᵥ = rₓᵧ**
This property is fundamental to the **step deviation method**, allowing simplified calculations when data values are large.
**Data Table 6.1**:
| Years of Education (X) | Annual Yield (Rs '000) (Y) |
|---|---|
| 0 | 4 |
| 2 | 4 |
| 4 | 6 |
| 6 | 10 |
| 8 | 10 |
| 10 | 8 |
| 12 | 7 |
**Calculations**:
**Using Formula 2**:
r = 42 / √(112 × 38) = 42 / √4256 = 42 / 65.24 = **0.644**
**Interpretation**: Positive correlation (0.644) indicates that more years of farmer education are associated with higher annual yield per acre. The moderate-to-strong strength suggests education significantly impacts agricultural productivity. This underscores policy importance of farmer education programs.
When epidemic spreads to villages, positive correlation between number of deaths and number of doctors sent appears counterintuitive. However, this does NOT mean doctors cause deaths. Reasons:
**Lesson**: Understanding data context is essential before interpreting correlation. Statistical methods are no substitute for logical reasoning.
When values of X and Y are large, computational burden increases significantly. The **step deviation method** uses the property that correlation is unaffected by change of origin and scale.
Transform variables using:
where:
Then: **rᵤᵥ = rₓᵧ** (correlation of transformed variables equals original correlation)
**Original Data**:
| Price Index (X) | Money Supply in Rs Crores (Y) |
|---|---|
| 120 | 1800 |
| 150 | 2000 |
| 190 | 2500 |
| 220 | 2700 |
| 230 | 3000 |
**Step 1**: Choose A = 100, h = 10, C = 1700, k = 100
**Step 2**: Calculate transformed values U and V:
| X | U = (X–100)/10 | Y | V = (Y–1700)/100 |
|---|---|---|---|
| 120 | 2 | 1800 | 1 |
| 150 | 5 | 2000 | 3 |
| 190 | 9 | 2500 | 8 |
| 220 | 12 | 2700 | 10 |
| 230 | 13 | 3000 | 13 |
**Step 3**: Calculate required values:
**Step 4**: Apply formula:
r = [N·ΣUV – (ΣU·ΣV)] / √[{N·ΣU² – (ΣU)²} × {N·ΣV² – (ΣV)²}]
r = [5(378) – (41×35)] / √[{5(423) – 41²} × {5(343) – 35²}]
r = [1890 – 1435] / √[{2115 – 1681} × {1715 – 1225}]
r = 455 / √[434 × 490] = 455 / √212,660 = 455 / 461.2 = **0.98**
**Interpretation**: Very strong positive correlation (0.98) between price index and money supply. This is a foundational premise of **monetary policy** — increases in money supply lead to proportional increases in price level. This relationship is central to inflation management and central bank operations.
**Developed by**: British psychologist C.E. Spearman.
**Definition**: Spearman's rank correlation measures the linear association between **ranks** assigned to items according to their attributes, rather than their actual numerical values.
1. **Measurable attributes where measurement is difficult**: When measurement devices are unavailable. Example: Ranking students by height and weight in remote village without measuring rods or scales.
2. **Non-measurable qualitative attributes**: Variables that cannot be numerically measured directly:
3. **Non-linear relationships**: When scatter diagram shows curved (non-linear) relationship (Figures 6.6-6.7), Spearman's coefficient is more appropriate than Karl Pearson's.
4. **Extreme values present**: Spearman's coefficient is **robust against extreme values**. If data contains outliers, Spearman's is superior to Karl Pearson's because it uses ranks rather than actual values.
rₐ = 1 – [6ΣD² / (n(n² – 1))]
where:
1. **Same interpretation as Karl Pearson's coefficient**:
2. **Direction and Strength**: Magnitude and sign have identical interpretation as in Pearson's coefficient.
3. **Robustness**: Not affected by extreme values because it uses ranks (order information) rather than actual values.
4. **Applicability**: All interpretation guidelines for correlation strength apply to Spearman's coefficient.
1. **Agricultural Economics**:
2. **Monetary Policy**:
3. **Economic Development Indicators**:
4. **Sectoral Relationships**:
Understanding correlation helps policymakers:
Example: If farmer education shows strong positive correlation with yield, investment in agricultural extension services is justified.
**Mistake**: Finding r = 0.8 between variables X and Y and concluding "X causes Y."
**Correction**: Correlation measures association, not causation. Third variables may cause both. Always examine logical relationship and data context.
**Mistake**: Calculating Karl Pearson's r for non-linear data.
**Correction**: First examine scatter diagram. For curved relationships, use Spearman's rank correlation or acknowledge non-linear relationship.
**Mistake**: Using Karl Pearson's coefficient when data contains outliers.
**Correction**: Use Spearman's rank correlation or examine if extreme values are data entry errors.
**Mistake**: Calculating r = 1.5 or r = –1.2.
**Correction**: Recheck calculation — error exists. Recompute using correct formula.
**Mistake**: Assuming r = 0.15 means no relationship exists.
**Correction**: Linear relationship is weak, but non-linear relationship may be strong.
**Mean**: X̄ = ΣX / N
**Variance**: σ² = Σ(X – X̄)² / N = ΣX² / N – (X̄)²
**Covariance**: Cov(X,Y) = Σ(X – X̄)(Y – Ȳ) / N
**Karl Pearson's Coefficient (Formula 1)**:
r = [Σ(xy) / N] / (σₓ × σᵧ)
**Karl Pearson's Coefficient (Formula 2)**:
r = Σ(X – X̄)(Y – Ȳ) / √[Σ(X – X̄)² × Σ(Y – Ȳ)²]
**Karl Pearson's Coefficient (Formula 4 – Most Useful)**:
r = [N·ΣXY – (ΣX·ΣY)] / √[{N·ΣX² – (ΣX)²} × {N·ΣY² – (ΣY)²}]
**Spearman's Rank Correlation**:
rₐ = 1 – [6ΣD² / (n(n² – 1))]
1. **Always draw scatter diagram first** before calculating coefficient.
2. **For large data values**, use step deviation method to reduce computational burden.
3. **When reporting r**, mention both direction (positive/negative) and strength (weak/strong/very strong).
4. **For qualitative variables** (beauty, honesty, intelligence), use only Spearman's rank correlation.
5. **Check answer reasonableness**: Is –1 ≤ r ≤ +1? Does sign match scatter diagram direction?
6. **Distinguish correlation from causation** in answer explanations.
7. **Indian economic examples strengthen answers**: Reference farmer education-yield relationship, agricultural markets, monetary policy applications.
8. **Show all calculation steps** in detail to earn partial credit if final answer has minor errors.
Q1. Which of the following is the PRIMARY advantage of using a scatter diagram to study correlation?
Answer: B — A scatter diagram's key advantage is visual inspection of the pattern and closeness of points, showing direction and strength without numerical computation.
Q2. Which statement correctly distinguishes between positive and negative correlation?
Answer: B — Positive correlation means when X increases Y increases (or both decrease); negative correlation means when X increases Y decreases and vice versa.
Q3. A scatter diagram shows points scattered randomly with no clear pattern. This indicates:
Answer: C — Random scatter with no upward or downward trend indicates the variables have no consistent linear relationship or extremely weak correlation.
Q4. In a mandi, as the supply of tomatoes increases dramatically during harvest season, the price drops from Rs 40/kg to Rs 4/kg. What type of correlation exists between supply and price?
Answer: B — As supply increases (↑), price decreases (↓), showing variables move in opposite directions—this is negative correlation.
Q5. Which of the following is NOT a valid reason to reject the claim that correlation implies causation?
Answer: D — The magnitude of variable values has no bearing on whether correlation implies causation; lurking variables, coincidence, and reversed causation are valid reasons to reject causal claims.
Q6. Karl Pearson's coefficient of correlation should be applied ONLY when:
Answer: B — Karl Pearson's correlation coefficient measures linear relationships; applying it to non-linear data can give misleading results.
Q7. Which correlation measurement technique is best suited for analyzing the relationship between students' physical appearance and their academic performance?
Answer: C — Physical appearance is a non-numerical attribute that must be ranked; Spearman's rank correlation is designed for ranking such qualitative variables.
Q8. Which of the following statements about perfect correlation is CORRECT? (A) Perfect positive correlation means r = +1 and all points lie on an upward-sloping line. (B) Perfect negative correlation means r = −1 and all points lie on a downward-sloping line. Choose the correct option:
Answer: C — Both statements are accurate: r = +1 indicates perfect positive correlation (upward line) and r = −1 indicates perfect negative correlation (downward line).
Q9. Study the following scenario: A researcher finds a strong positive correlation (r = 0.92) between the number of firefighters at a fire scene and the total fire damage caused. Which is the MOST appropriate conclusion?
Answer: C — This is a classic lurking variable example: larger fires cause both more damage and attract more firefighters; correlation does not imply the firefighters cause damage.
Q10. In India's agricultural sector, a government economist observes that as rainfall increases, agricultural productivity increases (positive correlation r = +0.78). However, an economist argues that correlation here is misleading because rainfall might not be the true cause. Which factor could the second economist be hinting at?
Answer: B — The economist is identifying lurking variables (soil fertility, irrigation, seeds) that could influence productivity independent of rainfall, showing why high correlation doesn't prove causation.
What does correlation measure?
Correlation measures the direction and intensity (strength) of the linear relationship between two variables, not causation.
Define positive correlation with an example.
When both variables move in the same direction (both increase or both decrease), like income and consumption rising together.
Define negative correlation with an example.
When variables move in opposite directions—as one increases, the other decreases, like price of apples and quantity demanded.
Why is the ice-cream and drowning death relationship NOT causal?
Both increase due to a third variable (rising temperature), not because ice-cream causes drowning; this is called a lurking variable.
What does a scatter diagram show about correlation?
It visually displays the direction and strength of a relationship: points on a line indicate strong/perfect correlation, scattered points indicate weak or no correlation.
What is perfect positive correlation?
All data points lie exactly on an upward-sloping straight line, showing a complete one-to-one positive relationship.
What is perfect negative correlation?
All data points lie exactly on a downward-sloping straight line, showing a complete one-to-one inverse relationship.
When should Karl Pearson's correlation coefficient be used?
Only when the relationship between two variables is linear (can be represented by a straight line).
What type of variables does Spearman's rank correlation measure?
It measures correlation between ranks of non-numerical attributes like intelligence, honesty, or physical appearance.
Why is correlation different from causation?
Correlation shows that two variables move together, but it does not prove one causes the other; there may be a third variable or pure coincidence.
Define correlation and explain why correlation cannot be interpreted as causation with one example from the study material. [2 marks]
State the definition of correlation (covariation, not causation). Use the ice-cream/drowning example or migratory birds/birth rates to show how a third variable or coincidence breaks the causal chain.
Explain the difference between positive and negative correlation. Draw or describe what a scatter diagram would look like for each type. Provide one economic example of each type of correlation. [5 marks]
Positive: both variables move in same direction (describe upward-sloping pattern on scatter). Negative: opposite directions (downward slope). Example for positive—income and consumption. Example for negative—price and demand. Show visual pattern clearly.
The following scatter plot shows the relationship between study hours (X-axis) and examination scores (Y-axis) for Class 11 students. The points form a clear upward-sloping line with one outlier. (a) Identify the type of correlation shown. (b) Explain why this correlation might be misleading or incomplete as evidence that more study hours CAUSE higher scores. (c) What third variable might actually be driving both study hours and examination scores? [6 marks]
Part (a): Positive correlation. Part (b): Correlation ≠ causation; lurking variables may exist. Part (c): Intelligence, aptitude, or prior subject knowledge could drive both effort and performance independently. Discuss the concept of covariation versus causation and why scatter diagrams alone cannot prove cause-effect.
Practice with interactive flashcards, mind maps, upload your own chapters and get AI study kits instantly
Try StudyOS Free →