📚 StudyOS CBSE Class 5–12 AI Tutor

Statistics

NCERT Class 10 · Mathematics Based on NCERT Class 10 Mathematics textbook · Free CBSE study kit

Chapter Notes

**STATISTICS: MEASURES OF CENTRAL TENDENCY FOR GROUPED DATA**

**1. MEAN OF GROUPED DATA**

• Mean (average) = Sum of all observations ÷ Number of observations

• Formula: x̄ = Σ(fixi) / Σfi, where xi = observations/class marks, fi = frequencies

• For ungrouped data: Directly use given values of x₁, x₂, ..., xₙ with frequencies f₁, f₂, ..., fₙ

• For grouped data: Use class marks (mid-points) as xi values

**Class Mark Formula:**

• Class mark = (Upper class limit + Lower class limit) / 2

• Example: For class 10-25, class mark = (10+25)/2 = 17.5

• Assumption: All frequencies in a class are concentrated at its mid-point

**Three Methods to Calculate Mean of Grouped Data:**

**Method 1: Direct Method**

  • Step 1: Find class mark (xi) for each class interval
  • Step 2: Calculate fixi for each class
  • Step 3: Find Σfi and Σfixi
  • Step 4: Apply formula x̄ = Σ(fixi) / Σfi
  • • Use when: Class marks and frequencies have small, manageable values

    • Advantage: Straightforward, no intermediate steps

    • Note: Gives approximate mean (due to class mark assumption), not exact

    **Method 2: Assumed Mean Method (Deviation Method)**

  • Step 1: Choose assumed mean 'a' (preferably a middle class mark)
  • Step 2: Calculate deviations: di = xi - a
  • Step 3: Find fidi for each class
  • Step 4: Calculate mean of deviations: d̄ = Σ(fidi) / Σfi
  • Step 5: Apply formula x̄ = a + d̄
  • • Use when: xi and fi values are large (reduces calculation complexity)

    • Important: Mean is independent of choice of 'a' — any class mark can be assumed mean

    • Formula derivation: d̄ = Σ(fi(xi - a)) / Σfi = Σ(fixi) / Σfi - a = x̄ - a

    • Therefore: x̄ = a + Σ(fidi) / Σfi

    **Method 3: Step Deviation Method**

  • Step 1: Choose assumed mean 'a' and class size 'h'
  • Step 2: Calculate ui = (xi - a) / h
  • Step 3: Find fiui for each class
  • Step 4: Calculate ū = Σ(fiui) / Σfi
  • Step 5: Apply formula x̄ = a + h·ū
  • • Use when: Deviations (di) are multiples of class size h

    • Advantage: Works with very small numbers, minimal calculation errors

    • Class size h: Width of each class interval (difference between consecutive limits)

    • Formula: x̄ = a + h·(Σ(fiui) / Σfi)

    **Key Points:**

    • All three methods give same result for grouped data

    • Grouped data mean is approximate because individual values are lost

    • Ungrouped data mean (using actual values) is exact and more accurate

    • Choose method based on magnitude of numbers in the data

    **2. MEDIAN OF GROUPED DATA**

    **Definition:**

    • Median: The value that divides the distribution into two equal parts

    • For grouped data: Class containing the (n/2)th observation, where n = total frequency

    **Median Formula:**

    Median = l + [(n/2 - cf) / f] × h

    Where:

    • l = lower limit of median class

    • n = total frequency (Σfi)

    • cf = cumulative frequency of class before median class

    • f = frequency of median class

    • h = class size (width of median class)

    **Steps to Find Median:**

  • Step 1: Calculate cumulative frequencies (cf) for all classes
  • Step 2: Find n/2, where n = Σfi
  • Step 3: Identify median class: The class where cumulative frequency ≥ n/2
  • Step 4: Note l, cf (before median class), f, and h for median class
  • Step 5: Substitute in formula
  • **Cumulative Frequency (cf):**

    • cf of a class = sum of frequencies up to and including that class

    • cf of first class = f₁

    • cf of second class = f₁ + f₂

    • cf of last class = n (total frequency)

    • Cumulative frequency increases as we move down classes

    **3. MODE OF GROUPED DATA**

    **Definition:**

    • Mode: Class (modal class) with highest frequency

    • Modal class: Interval with maximum fi

    **Mode Formula:**

    Mode = l + [(f₁ - f₀) / (2f₁ - f₀ - f₂)] × h

    Where:

    • l = lower limit of modal class

    • f₁ = frequency of modal class

    • f₀ = frequency of class before modal class

    • f₂ = frequency of class after modal class

    • h = class size

    **Steps to Find Mode:**

  • Step 1: Identify modal class (highest frequency)
  • Step 2: Note the frequency of class before modal class (f₀)
  • Step 3: Note the frequency of class after modal class (f₂)
  • Step 4: Apply formula
  • **4. CUMULATIVE FREQUENCY AND OGIVES**

    **Cumulative Frequency Distribution:**

    • Table showing class intervals with their cumulative frequencies

    • Used to find median and draw ogive curves

    • Two types: Less than ogive, Greater than or equal to ogive

    **Less Than Ogive (Cumulative Frequency Curve):**

    • Points plotted: (Upper class limit, Cumulative frequency)

    • Curve starts from origin and increases upward

    • Used to find median: Locate n/2 on y-axis, draw horizontal line to curve, then vertical to x-axis

    **Greater Than or Equal to Ogive:**

    • Points plotted: (Lower class limit, Cumulative frequency)

    • Curve decreases from top-left to bottom-right

    • Two ogives intersect at point whose x-coordinate is the median

    **5. RELATIONSHIP BETWEEN MEAN, MEDIAN, MODE**

    • For symmetrical distribution: Mean = Median = Mode

    • Empirical relationship: Mode = 3(Median) - 2(Mean)

    • Or: Mean - Mode = 3(Mean - Median)

    **6. COMMON MISTAKES TO AVOID**

    • **Mistake 1:** Using class limits instead of class marks as xi values → Always calculate class mark = (upper + lower limit)/2

    • **Mistake 2:** Confusing cumulative frequency with frequency → cf includes all previous classes; frequency is only for current class

    • **Mistake 3:** Using wrong class in median formula → Check that cumulative frequency of selected class is first value ≥ n/2

    • **Mistake 4:** Incorrect class size → h = difference between consecutive class limits (e.g., if classes are 0-10, 10-20, then h = 10)

    • **Mistake 5:** Sign errors in deviation method → If xi > a, then di is positive; if xi < a, then di is negative

    • **Mistake 6:** Forgetting to multiply by h in step deviation method → Final formula is x̄ = a + h·ū, not just a + ū

    • **Mistake 7:** Wrong identification of median class → Count carefully to find class where cf ≥ n/2 for first time

    • **Mistake 8:** Assuming mean, median, mode are always equal → They're equal only for symmetric distributions; usually different

    **7. IMPORTANT FORMULAS SUMMARY**

    Mean (Direct): x̄ = Σ(fixi) / Σfi

    Mean (Assumed): x̄ = a + Σ(fidi) / Σfi

    Mean (Step Dev): x̄ = a + h·Σ(fiui) / Σfi

    Median: l + [(n/2 - cf) / f] × h

    Mode: l + [(f₁ - f₀) / (2f₁ - f₀ - f₂)] × h

    Class mark: (Upper limit + Lower limit) / 2

    Deviation: di = xi - a

    Step deviation: ui = (xi - a) / h

    MCQs — 10 Questions with Answers

    Q1. A teacher groups the marks of 50 students into class intervals and calculates the mean using class marks (midpoints). A student claims this mean will always equal the mean calculated using individual ungrouped marks. Is this claim correct, and why?

    • A. No, grouping introduces an assumption that all values in a class cluster at the midpoint, which may not reflect actual distribution ✓
    • B. Yes, because the sum of all frequencies remains the same
    • C. Yes, because class marks are averages of class limits
    • D. No, because class intervals cannot be used for mean calculation

    Answer: A — Grouping data loses information about actual individual values; the midpoint assumption introduces approximation error, making the grouped mean different from the ungrouped mean, as shown in the chapter's Example 1 vs Table 13.3 (59.3 vs 62).

    Q2. In the assumed mean method, a student chooses the class mark a = 92.5 (near the upper end) instead of a central value like 47.5. How does this choice affect the validity of the formula x̄ = a + d̄?

    • A. The formula becomes invalid because 'a' must always be the midpoint of all class marks
    • B. The formula remains valid, but calculations become more tedious due to larger deviations (di) ✓
    • C. The formula is valid only if frequencies are symmetrically distributed around a = 92.5
    • D. The formula cannot be applied when a is not centrally located

    Answer: B — The assumed mean method works for any choice of 'a' because the mathematical relationship x̄ = a + d̄ holds universally; however, choosing 'a' near the center minimizes absolute deviations and reduces computational effort, not because the formula requires it.

    Q3. Assertion (A): When calculating the mean of grouped data using the direct method, multiplying each frequency by its corresponding class mark is mathematically equivalent to assuming all data in that class is concentrated at the class mark. Reason (R): The class mark represents the average of the upper and lower class limits. Choose the correct option:

    • A. Both A and R are true and R is the correct explanation of A
    • B. Both A and R are true but R is not the correct explanation of A ✓
    • C. A is true but R is false
    • D. A is false but R is true

    Answer: B — A is true (the method assumes concentration at class mark), and R is true (class mark is defined as the average of limits), but R does not explain WHY we make this assumption—we do so for computational convenience and to represent the class centrally, not because of how class mark is calculated.

    Q4. A student observes that in Table 13.4, deviations (di) are positive for classes above 47.5 and negative for classes below it. She concludes that the assumed mean must always be chosen as the middle class mark. Is this conclusion justified?

    • A. Yes, because deviations must balance around the assumed mean
    • B. No, the sign pattern of deviations depends on which value is chosen as 'a', not on whether 'a' is the middle class mark ✓
    • C. Yes, because the formula requires a = (upper class mark + lower class mark)/2
    • D. No, because deviations can be positive or negative regardless of the choice of 'a'

    Answer: B — The sign pattern of di naturally mirrors which class marks are above or below the chosen 'a'; choosing the middle value is a convenience to reduce calculation work, not a requirement for the method to function correctly.

    Q5. Assertion (A): If two datasets with the same frequencies but different class intervals are grouped and their means calculated, the grouped means will always be different. Reason (R): Different class intervals produce different class marks, which affect the calculation of the mean. Choose the correct option:

    • A. Both A and R are true and R is the correct explanation of A
    • B. Both A and R are true but R is not the correct explanation of A
    • C. A is true but R is false
    • D. A is false but R is true ✓

    Answer: D — A is false because different groupings of the same raw data may yield the same grouped mean if class marks happen to align with the distribution; R is true—different class marks do affect the mean—but the conclusion in A overgeneralizes.

    Q6. When using the assumed mean method with di = xi – a, a student obtains Σ(fidi) = 0. What does this result tell us about the relationship between 'a' and the true mean x̄?

    • A. It means a = x̄, because zero deviation from the mean occurs only when the assumed mean equals the true mean ✓
    • B. It means the data is symmetric, so the median and mode must also equal a
    • C. It is inconclusive; zero mean deviation does not guarantee a = x̄
    • D. It means all frequencies are equal

    Answer: A — From the formula x̄ = a + d̄, if Σ(fidi) = 0, then d̄ = 0, which implies x̄ = a + 0 = a, making the assumed mean equal to the true mean.

    Q7. Assertion (A): The assumed mean method and the direct method always produce identical means for the same grouped dataset. Reason (R): Both methods use the same class marks and the mathematical formula for mean. Choose the correct option:

    • A. Both A and R are true and R is the correct explanation of A ✓
    • B. Both A and R are true but R is not the correct explanation of A
    • C. A is true but R is false
    • D. A is false but R is true

    Answer: A — Both A and R are true: the assumed mean method is an algebraic rearrangement of the direct method (x̄ = a + Σ(fidi)/Σfi is derived from x̄ = Σ(fixi)/Σfi), so they must yield identical results, and this is explained by the fact that both use the same underlying formula applied to the same class marks.

    Q8. A dataset is grouped into class intervals of width 10. A student argues that using class marks (midpoints) introduces less error than using the lower class limit as the representative value. Evaluate this claim.

    • A. The claim is incorrect; both methods introduce the same magnitude of error on average
    • B. The claim is correct; class marks are centrally located, so deviations from actual values are generally smaller and more balanced ✓
    • C. The claim is correct only if the data within each class is uniformly distributed
    • D. The claim is incorrect; the lower class limit is always more representative

    Answer: B — Using the class mark minimizes error because it balances deviations above and below the representative point, whereas the lower limit would systematically underestimate for most classes.

    Q9. Assertion (A): In the assumed mean method, if Σ(fidi) is negative and large in magnitude, the true mean x̄ will be significantly less than the assumed mean 'a'. Reason (R): The formula x̄ = a + d̄ shows that x̄ is obtained by adding the mean deviation to the assumed mean. Choose the correct option:

    • A. Both A and R are true and R is the correct explanation of A ✓
    • B. Both A and R are true but R is not the correct explanation of A
    • C. A is true but R is false
    • D. A is false but R is true

    Answer: A — A is true: if Σ(fidi) < 0, then d̄ < 0, so x̄ = a + d̄ < a; R is true and correctly explains A because the formula directly shows how a negative mean deviation pulls the true mean below the assumed mean.

    Q10. A teacher presents two grouped frequency tables for the same raw data with different class widths. A student claims that the table with smaller class width will always produce a more accurate mean. Is this claim universally valid?

    • A. Yes, because smaller class width means class marks are closer to actual individual values
    • B. No, accuracy also depends on how data is distributed within each class; a smaller width only guarantees reduced range within a class, not reduced error ✓
    • C. Yes, because the assumed mean method becomes more efficient with smaller widths
    • D. No, because smaller class widths increase computational difficulty

    Answer: B — While smaller class width generally improves approximation by bringing class marks closer to data, the actual error reduction depends on the distribution of data within classes; a small width alone does not ensure accuracy if data is clustered away from midpoints.

    Flashcards

    What is the formula for mean of grouped data using direct method?

    Mean = Σ(fi × xi) ÷ Σfi, where fi is frequency and xi is class mark.

    How do you find the class mark of a class interval?

    Class mark = (Upper class limit + Lower class limit) ÷ 2.

    What is the assumed mean method and when is it used?

    Choose a central class mark as 'a', find deviations di = xi − a, then mean = a + Σ(fi × di) ÷ Σfi; used when xi values are large.

    What is the step deviation method formula?

    ui = (xi − a) ÷ h, then mean = a + h × Σ(fi × ui) ÷ Σfi, where h is class size.

    Why is the mean of grouped data approximate, not exact?

    Because grouped data assumes all frequencies in a class are concentrated at the class mark, not at actual individual values.

    If a student scores exactly at the upper class limit, which class does the score belong to?

    The score belongs to the next higher class interval, not the current one.

    Does the choice of assumed mean 'a' affect the final mean value?

    No, the final mean is the same regardless of which xi is chosen as 'a'; Activity 1 proves this.

    Why do we divide di by class size h in the step deviation method?

    To convert all deviations to smaller numbers that are multiples of the class size, reducing calculation effort.

    What is Σ notation and what does it represent?

    Σ (Greek letter sigma) represents summation; Σfi × xi means add up all products of fi and xi.

    How does grouped data differ from ungrouped data in finding mean?

    Ungrouped data uses actual individual values; grouped data uses class marks to represent all values in each class interval.

    Important Board Questions

    Define class mark. How is it calculated for a class interval 50−70? [2 marks]

    State that class mark is the midpoint of a class interval; use the formula (upper limit + lower limit) / 2 and substitute 50 and 70 to get 60.

    A frequency distribution has class intervals 0−10, 10−20, 20−30 with frequencies 5, 8, 7 respectively. Using the direct method, find the mean. [3 marks]

    Find class marks (5, 15, 25), calculate fi × xi for each class, sum to get Σ(fi × xi) = 390 and Σfi = 20, then apply mean = 390/20 = 19.5.

    A dataset of 30 observations has class intervals 0−5, 5−10, 10−15, 15−20, 20−25 with frequencies 4, 6, 8, 7, 5. Using the assumed mean method with a = 12.5, find the mean and explain why this method reduces calculation effort compared to the direct method. [5 marks]

    Calculate class marks (2.5, 7.5, 12.5, 17.5, 22.5), find deviations di from a = 12.5, multiply by frequencies to get Σ(fi × di), apply mean = a + Σ(fi × di)/Σfi = 12.5 + 25/30 ≈ 13.33; explain that deviations are smaller numbers reducing arithmetic complexity compared to multiplying large xi values directly.

    Next chapterProbability →

    Practice with interactive flashcards, mind maps, upload your own chapters and get AI study kits instantly

    Try StudyOS Free →