**STATISTICS: MEASURES OF CENTRAL TENDENCY FOR GROUPED DATA**
**1. MEAN OF GROUPED DATA**
• Mean (average) = Sum of all observations ÷ Number of observations
• Formula: x̄ = Σ(fixi) / Σfi, where xi = observations/class marks, fi = frequencies
• For ungrouped data: Directly use given values of x₁, x₂, ..., xₙ with frequencies f₁, f₂, ..., fₙ
• For grouped data: Use class marks (mid-points) as xi values
**Class Mark Formula:**
• Class mark = (Upper class limit + Lower class limit) / 2
• Example: For class 10-25, class mark = (10+25)/2 = 17.5
• Assumption: All frequencies in a class are concentrated at its mid-point
**Three Methods to Calculate Mean of Grouped Data:**
**Method 1: Direct Method**
• Use when: Class marks and frequencies have small, manageable values
• Advantage: Straightforward, no intermediate steps
• Note: Gives approximate mean (due to class mark assumption), not exact
**Method 2: Assumed Mean Method (Deviation Method)**
• Use when: xi and fi values are large (reduces calculation complexity)
• Important: Mean is independent of choice of 'a' — any class mark can be assumed mean
• Formula derivation: d̄ = Σ(fi(xi - a)) / Σfi = Σ(fixi) / Σfi - a = x̄ - a
• Therefore: x̄ = a + Σ(fidi) / Σfi
**Method 3: Step Deviation Method**
• Use when: Deviations (di) are multiples of class size h
• Advantage: Works with very small numbers, minimal calculation errors
• Class size h: Width of each class interval (difference between consecutive limits)
• Formula: x̄ = a + h·(Σ(fiui) / Σfi)
**Key Points:**
• All three methods give same result for grouped data
• Grouped data mean is approximate because individual values are lost
• Ungrouped data mean (using actual values) is exact and more accurate
• Choose method based on magnitude of numbers in the data
**2. MEDIAN OF GROUPED DATA**
**Definition:**
• Median: The value that divides the distribution into two equal parts
• For grouped data: Class containing the (n/2)th observation, where n = total frequency
**Median Formula:**
Median = l + [(n/2 - cf) / f] × h
Where:
• l = lower limit of median class
• n = total frequency (Σfi)
• cf = cumulative frequency of class before median class
• f = frequency of median class
• h = class size (width of median class)
**Steps to Find Median:**
**Cumulative Frequency (cf):**
• cf of a class = sum of frequencies up to and including that class
• cf of first class = f₁
• cf of second class = f₁ + f₂
• cf of last class = n (total frequency)
• Cumulative frequency increases as we move down classes
**3. MODE OF GROUPED DATA**
**Definition:**
• Mode: Class (modal class) with highest frequency
• Modal class: Interval with maximum fi
**Mode Formula:**
Mode = l + [(f₁ - f₀) / (2f₁ - f₀ - f₂)] × h
Where:
• l = lower limit of modal class
• f₁ = frequency of modal class
• f₀ = frequency of class before modal class
• f₂ = frequency of class after modal class
• h = class size
**Steps to Find Mode:**
**4. CUMULATIVE FREQUENCY AND OGIVES**
**Cumulative Frequency Distribution:**
• Table showing class intervals with their cumulative frequencies
• Used to find median and draw ogive curves
• Two types: Less than ogive, Greater than or equal to ogive
**Less Than Ogive (Cumulative Frequency Curve):**
• Points plotted: (Upper class limit, Cumulative frequency)
• Curve starts from origin and increases upward
• Used to find median: Locate n/2 on y-axis, draw horizontal line to curve, then vertical to x-axis
**Greater Than or Equal to Ogive:**
• Points plotted: (Lower class limit, Cumulative frequency)
• Curve decreases from top-left to bottom-right
• Two ogives intersect at point whose x-coordinate is the median
**5. RELATIONSHIP BETWEEN MEAN, MEDIAN, MODE**
• For symmetrical distribution: Mean = Median = Mode
• Empirical relationship: Mode = 3(Median) - 2(Mean)
• Or: Mean - Mode = 3(Mean - Median)
**6. COMMON MISTAKES TO AVOID**
• **Mistake 1:** Using class limits instead of class marks as xi values → Always calculate class mark = (upper + lower limit)/2
• **Mistake 2:** Confusing cumulative frequency with frequency → cf includes all previous classes; frequency is only for current class
• **Mistake 3:** Using wrong class in median formula → Check that cumulative frequency of selected class is first value ≥ n/2
• **Mistake 4:** Incorrect class size → h = difference between consecutive class limits (e.g., if classes are 0-10, 10-20, then h = 10)
• **Mistake 5:** Sign errors in deviation method → If xi > a, then di is positive; if xi < a, then di is negative
• **Mistake 6:** Forgetting to multiply by h in step deviation method → Final formula is x̄ = a + h·ū, not just a + ū
• **Mistake 7:** Wrong identification of median class → Count carefully to find class where cf ≥ n/2 for first time
• **Mistake 8:** Assuming mean, median, mode are always equal → They're equal only for symmetric distributions; usually different
**7. IMPORTANT FORMULAS SUMMARY**
Mean (Direct): x̄ = Σ(fixi) / Σfi
Mean (Assumed): x̄ = a + Σ(fidi) / Σfi
Mean (Step Dev): x̄ = a + h·Σ(fiui) / Σfi
Median: l + [(n/2 - cf) / f] × h
Mode: l + [(f₁ - f₀) / (2f₁ - f₀ - f₂)] × h
Class mark: (Upper limit + Lower limit) / 2
Deviation: di = xi - a
Step deviation: ui = (xi - a) / h
Q1. A teacher groups the marks of 50 students into class intervals and calculates the mean using class marks (midpoints). A student claims this mean will always equal the mean calculated using individual ungrouped marks. Is this claim correct, and why?
Answer: A — Grouping data loses information about actual individual values; the midpoint assumption introduces approximation error, making the grouped mean different from the ungrouped mean, as shown in the chapter's Example 1 vs Table 13.3 (59.3 vs 62).
Q2. In the assumed mean method, a student chooses the class mark a = 92.5 (near the upper end) instead of a central value like 47.5. How does this choice affect the validity of the formula x̄ = a + d̄?
Answer: B — The assumed mean method works for any choice of 'a' because the mathematical relationship x̄ = a + d̄ holds universally; however, choosing 'a' near the center minimizes absolute deviations and reduces computational effort, not because the formula requires it.
Q3. Assertion (A): When calculating the mean of grouped data using the direct method, multiplying each frequency by its corresponding class mark is mathematically equivalent to assuming all data in that class is concentrated at the class mark. Reason (R): The class mark represents the average of the upper and lower class limits. Choose the correct option:
Answer: B — A is true (the method assumes concentration at class mark), and R is true (class mark is defined as the average of limits), but R does not explain WHY we make this assumption—we do so for computational convenience and to represent the class centrally, not because of how class mark is calculated.
Q4. A student observes that in Table 13.4, deviations (di) are positive for classes above 47.5 and negative for classes below it. She concludes that the assumed mean must always be chosen as the middle class mark. Is this conclusion justified?
Answer: B — The sign pattern of di naturally mirrors which class marks are above or below the chosen 'a'; choosing the middle value is a convenience to reduce calculation work, not a requirement for the method to function correctly.
Q5. Assertion (A): If two datasets with the same frequencies but different class intervals are grouped and their means calculated, the grouped means will always be different. Reason (R): Different class intervals produce different class marks, which affect the calculation of the mean. Choose the correct option:
Answer: D — A is false because different groupings of the same raw data may yield the same grouped mean if class marks happen to align with the distribution; R is true—different class marks do affect the mean—but the conclusion in A overgeneralizes.
Q6. When using the assumed mean method with di = xi – a, a student obtains Σ(fidi) = 0. What does this result tell us about the relationship between 'a' and the true mean x̄?
Answer: A — From the formula x̄ = a + d̄, if Σ(fidi) = 0, then d̄ = 0, which implies x̄ = a + 0 = a, making the assumed mean equal to the true mean.
Q7. Assertion (A): The assumed mean method and the direct method always produce identical means for the same grouped dataset. Reason (R): Both methods use the same class marks and the mathematical formula for mean. Choose the correct option:
Answer: A — Both A and R are true: the assumed mean method is an algebraic rearrangement of the direct method (x̄ = a + Σ(fidi)/Σfi is derived from x̄ = Σ(fixi)/Σfi), so they must yield identical results, and this is explained by the fact that both use the same underlying formula applied to the same class marks.
Q8. A dataset is grouped into class intervals of width 10. A student argues that using class marks (midpoints) introduces less error than using the lower class limit as the representative value. Evaluate this claim.
Answer: B — Using the class mark minimizes error because it balances deviations above and below the representative point, whereas the lower limit would systematically underestimate for most classes.
Q9. Assertion (A): In the assumed mean method, if Σ(fidi) is negative and large in magnitude, the true mean x̄ will be significantly less than the assumed mean 'a'. Reason (R): The formula x̄ = a + d̄ shows that x̄ is obtained by adding the mean deviation to the assumed mean. Choose the correct option:
Answer: A — A is true: if Σ(fidi) < 0, then d̄ < 0, so x̄ = a + d̄ < a; R is true and correctly explains A because the formula directly shows how a negative mean deviation pulls the true mean below the assumed mean.
Q10. A teacher presents two grouped frequency tables for the same raw data with different class widths. A student claims that the table with smaller class width will always produce a more accurate mean. Is this claim universally valid?
Answer: B — While smaller class width generally improves approximation by bringing class marks closer to data, the actual error reduction depends on the distribution of data within classes; a small width alone does not ensure accuracy if data is clustered away from midpoints.
What is the formula for mean of grouped data using direct method?
Mean = Σ(fi × xi) ÷ Σfi, where fi is frequency and xi is class mark.
How do you find the class mark of a class interval?
Class mark = (Upper class limit + Lower class limit) ÷ 2.
What is the assumed mean method and when is it used?
Choose a central class mark as 'a', find deviations di = xi − a, then mean = a + Σ(fi × di) ÷ Σfi; used when xi values are large.
What is the step deviation method formula?
ui = (xi − a) ÷ h, then mean = a + h × Σ(fi × ui) ÷ Σfi, where h is class size.
Why is the mean of grouped data approximate, not exact?
Because grouped data assumes all frequencies in a class are concentrated at the class mark, not at actual individual values.
If a student scores exactly at the upper class limit, which class does the score belong to?
The score belongs to the next higher class interval, not the current one.
Does the choice of assumed mean 'a' affect the final mean value?
No, the final mean is the same regardless of which xi is chosen as 'a'; Activity 1 proves this.
Why do we divide di by class size h in the step deviation method?
To convert all deviations to smaller numbers that are multiples of the class size, reducing calculation effort.
What is Σ notation and what does it represent?
Σ (Greek letter sigma) represents summation; Σfi × xi means add up all products of fi and xi.
How does grouped data differ from ungrouped data in finding mean?
Ungrouped data uses actual individual values; grouped data uses class marks to represent all values in each class interval.
Define class mark. How is it calculated for a class interval 50−70? [2 marks]
State that class mark is the midpoint of a class interval; use the formula (upper limit + lower limit) / 2 and substitute 50 and 70 to get 60.
A frequency distribution has class intervals 0−10, 10−20, 20−30 with frequencies 5, 8, 7 respectively. Using the direct method, find the mean. [3 marks]
Find class marks (5, 15, 25), calculate fi × xi for each class, sum to get Σ(fi × xi) = 390 and Σfi = 20, then apply mean = 390/20 = 19.5.
A dataset of 30 observations has class intervals 0−5, 5−10, 10−15, 15−20, 20−25 with frequencies 4, 6, 8, 7, 5. Using the assumed mean method with a = 12.5, find the mean and explain why this method reduces calculation effort compared to the direct method. [5 marks]
Calculate class marks (2.5, 7.5, 12.5, 17.5, 22.5), find deviations di from a = 12.5, multiply by frequencies to get Σ(fi × di), apply mean = a + Σ(fi × di)/Σfi = 12.5 + 25/30 ≈ 13.33; explain that deviations are smaller numbers reducing arithmetic complexity compared to multiplying large xi values directly.
Practice with interactive flashcards, mind maps, upload your own chapters and get AI study kits instantly
Try StudyOS Free →