Organisation of Data

Q: Raw data in Table 3.1 showing marks of 100 students is difficult to analyse because:

The data is large, unorganised, and does not show any pattern easily. Raw data is unclassified and disorganised, making it tedious to draw meaningful conclusions without systematic classification and grouping.

Q: What is the primary purpose of classifying data in the context of the Census 2001?

To arrange 20 crore observations by gender, education, occupation so meaningful insights become visible. Classification of massive Census data by variables like gender and occupation converts fragmented raw data into understandable population structure.

Q: Which statement about classification of data is CORRECT?

Classification must group similar characteristics while excluding items that do not fit the criteria. Classification follows logical rules—items with similar characteristics go together, and unrelated items are excluded from that group.

Q: In a frequency distribution table for marks of students, what does the 'frequency' column represent?

The number of students whose marks fall within each class interval. Frequency in a class is the count of how many observations belong to that specific class or group.

Q: The kabadiwallah classifies junk into newspapers, glass, metals, and plastics. This is an example of:

Qualitative classification (organised by type or quality). The kabadiwallah groups items by their type or material (newspaper, glass, metal, plastic), which is qualitative classification.

Q: Which of the following is NOT a requirement for a good classification?

Classes must be arranged in alphabetical order always. Alphabetical order is not mandatory—classification can be by time, quantity, type, or any logical principle; what matters is clarity and consistency.

Q: Monthly household expenditure data of 50 families (Table 3.2) is difficult to analyse because it:

Is presented as a raw, unorganised list where average expenditure cannot be found easily. Raw unclassified expenditure data in Table 3.2 is jumbled; without grouping into class intervals, calculating average or finding patterns is tedious.

Q: Assertion: Chronological classification arranges data by time periods. Reason: Raw data must always be classified by time before any statistical analysis. Which is true?

Assertion is correct, but Reason is incorrect—data can be classified by type, time, or quantity depending on purpose. Chronological classification (arranging by year/month/week) is one method, but data can also be classified qualitatively or quantitatively based on research purpose.

Q: A researcher collects data on students' marks (ranging 0–100) for 200 students. If she forms class intervals of 0–10, 10–20, 20–30, etc., what is the primary advantage of this classification?

It converts 200 unorganised individual values into 10 manageable groups showing frequency distribution. Class intervals group 200 scattered marks into 10 ranges, making it easy to see how many students fall in each interval and identify patterns.

Q: A bivariate frequency distribution differs from a univariate distribution in that it:

Studies the relationship between two variables instead of one variable only. Bivariate distribution examines two variables together (e.g., marks AND attendance), while univariate examines only one variable (e.g., marks alone).

NCERT Class 11 · Economics Based on NCERT Class 11 Economics textbook · Free CBSE study kit

Chapter Notes

ORGANISATION OF DATA: COMPREHENSIVE CHAPTER NOTES

INTRODUCTION AND PURPOSE OF DATA CLASSIFICATION

**Data classification** is the systematic arrangement of raw data into logical groups or classes based on specific criteria. This process brings order to disorganised data, making it easier to analyse and draw meaningful conclusions.

**Why Classification is Essential:**

Raw data are highly disorganised and difficult to interpret

Large datasets (like Census 2001's 20 crore observations) become comprehensible only when classified

Allows for easy location of information, comparison, and inference

Reduces the effort and time required to extract meaningful information

Enables application of statistical methods

**Real-life Example:** A kabadiwallah (junk dealer) classifies waste materials into categories like newspapers, glass, metals (iron, copper, aluminium, brass), plastics, etc. This organisation allows him to quickly find specific items when customers demand them. Similarly, students classify textbooks by subject—"History," "Mathematics," "Science"—to easily locate required books without searching the entire collection.

**Key Principle:** Classification must follow logical criteria. If a book on geography were placed in the "History" section, the classification system loses its purpose. Every observation must fit into only one class.

---

RAW DATA: CHARACTERISTICS AND CHALLENGES

**Raw Data Definition:** Unclassified or ungrouped data collected through census, surveys, or experiments before any organisation or analysis.

**Characteristics of Raw Data:**

Large and cumbersome to handle

Disorganised and presented in random order

Difficult to interpret and draw conclusions from

Do not lend themselves easily to statistical methods

Require summarisation before analysis

**Real-life Example:** Table 3.1 (marks of 100 students in mathematics) shows raw data where numbers appear in random order: 47, 45, 10, 60, 51, 56, 66... Finding the highest mark, average performance, or distribution is tedious and prone to error.

Table 3.2 (monthly household expenditure of 50 families) similarly presents data in an unorganised manner, making it impossible at a glance to determine average expenditure or distribution patterns.

**Challenges with Raw Data:**

For 100 observations, manual arrangement is time-consuming

With thousands or millions of observations (like Census data), analysis becomes nearly impossible without classification

Cannot yield meaningful statistical conclusions without organisation

Comparisons and inferences require extensive manual calculations

---

CLASSIFICATION OF DATA: TYPES AND METHODS

Data classification depends on the **nature of the variable** being studied and the **purpose of analysis**. Different classification methods serve different analytical needs.

CHRONOLOGICAL CLASSIFICATION

**Definition:** Data arranged in order of time (ascending or descending) with reference to years, quarters, months, weeks, or days.

**Characteristics:**

Used for **Time Series Data**—values of variables recorded at different time intervals

Shows trends, patterns, and changes over time

Particularly useful for identifying growth, decline, or cyclical patterns

**Example:** Population of India (Time Series)

| Year | Population (Crores) |

|------|-------------------|

| 1951 | 35.7 |

| 1961 | 43.8 |

| 1971 | 54.6 |

| 1981 | 68.4 |

| 1991 | 81.8 |

| 2001 | 102.7 |

| 2011 | 121.0 |

This clearly shows India's population increased consistently over 60 years, demonstrating growth trajectory.

**Board Exam Important:** Time series data is essential for policy analysis, planning, and forecasting in Indian economic context.

---

SPATIAL CLASSIFICATION

**Definition:** Data classified with reference to geographical locations—countries, states, cities, districts, or regions.

**Characteristics:**

Used for **Cross-sectional Data**—observations at same time but different locations

Enables comparison between regions

Useful for assessing regional disparities and differences

**Example:** Wheat Yield Across Countries (2013)

| Country | Yield (kg/hectare) |

|---------|------------------|

| Canada | 3594 |

| China | 5055 |

| France | 7254 |

| Germany | 7998 |

| India | 3154 |

| Pakistan | 2787 |

This classification reveals that Germany has highest yield (7998) while Pakistan has lowest (2787) among listed countries. India's yield (3154) is slightly above Pakistan but significantly lower than developed countries—indicating agricultural productivity gaps relevant to India's economic development.

**Board Exam Important:** Spatial classification of Indian data (state-wise, region-wise) is crucial for understanding regional development disparities.

---

QUALITATIVE CLASSIFICATION

**Definition:** Classification based on **qualities or attributes**—characteristics that cannot be measured numerically but can be categorised as present or absent.

**Examples of Attributes:**

Gender (Male/Female)

Literacy status (Literate/Illiterate)

Marital status (Married/Unmarried)

Religion, nationality, occupation category

**Key Feature:** Data can be classified at multiple levels (hierarchical classification).

**Example:** Population Classification by Gender and Marital Status

```

Population

├── Male

│ ├── Married

│ └── Unmarried

└── Female

├── Married

└── Unmarried

```

First classification level: presence/absence of "maleness" (male or not male/female)

Second classification level: presence/absence of "married" status

**Board Exam Important:** Qualitative classification is essential for Census data analysis, employment statistics, and demographic studies relevant to Indian development.

---

QUANTITATIVE CLASSIFICATION

**Definition:** Classification of data on measurable characteristics (quantitative variables) into numerical classes or intervals.

**Examples of Quantitative Variables:**

Height, weight, age, income, marks, temperature, distance, time

**Characteristics:**

Based on numerical values and intervals

Requires formation of class intervals

Most useful for continuous and discrete quantitative variables

Forms the basis for frequency distribution

**Example:** Frequency Distribution of Mathematics Marks (100 Students)

| Marks | Frequency |

|-------|-----------|

| 0–10 | 1 |

| 10–20 | 8 |

| 20–30 | 6 |

| 30–40 | 7 |

| 40–50 | 21 |

| 50–60 | 23 |

| 60–70 | 19 |

| 70–80 | 6 |

| 80–90 | 5 |

| 90–100 | 4 |

| Total | 100 |

This classification reveals that maximum concentration (23 students) lies in 50–60 range, and only 1 student scored below 10 marks.

---

VARIABLES: CONTINUOUS AND DISCRETE

Understanding variable types is fundamental to determining appropriate classification and statistical methods.

CONTINUOUS VARIABLES

**Definition:** Variables that can take **any numerical value** within a range—whole numbers, fractions, or irrational numbers.

**Characteristics:**

Can assume infinite number of values within given range

Takes values like: 90cm, 90.5cm, 90.85cm, 90.999cm

Intermediate values between two points are always possible

Values can be broken into infinite gradations

**Examples:**

Height: 90cm to 150cm (can take 90.5cm, 102.34cm, 149.99cm)

Weight: measured in decimals

Time: can be measured in seconds, milliseconds

Distance, temperature, income (in decimals)

Age: 25 years, 25.5 years, 25.75 years

**Mathematical Property:** For any two values a and b (where a < b) of a continuous variable, there exists another value c such that a < c < b.

**Board Exam Important:** Continuous variables require class intervals for frequency distribution. Default assumption is that data is continuous unless stated otherwise.

---

DISCRETE VARIABLES

**Definition:** Variables that can take **only specific/certain values** with finite jumps between them; cannot take intermediate values.

**Characteristics:**

Changes by finite jumps only

Jumps from one value to another without taking intermediate values

Often (but not always) whole numbers only

Fractional values possible only if they form specific sequence

**Examples:**

Number of students in class: can be 25 or 26, but NOT 25.5 ("half a student" is illogical)

Number of children in family: 1, 2, 3 (not 2.5)

Number of vehicles on road: whole numbers only

Number of books: cannot be fractional

Rolling a dice: shows 1, 2, 3, 4, 5, or 6 only

**Special Case:** Variable taking fractional values (1/8, 1/16, 1/32, 1/64...) can still be discrete if no values exist between adjacent terms. It jumps from 1/8 to 1/16, not taking intermediate values.

**Board Exam Important:** Discrete and continuous variable identification determines whether inclusive or exclusive class intervals should be used.

---

FREQUENCY DISTRIBUTION: DEFINITION AND COMPONENTS

**Frequency Distribution Definition:** A comprehensive method of classifying and presenting raw quantitative data showing:

How different values of variable are distributed

Across different class intervals

With corresponding frequency (count) in each class

**Class Frequency:** Number of observations falling within a particular class interval.

**Example:** In class 30–40 from raw data (Table 3.1), values are 30, 37, 34, 30, 35, 39, 32 = frequency is 7.

KEY TERMINOLOGIES IN FREQUENCY DISTRIBUTION

**Class Limits:**

Two boundary values that define a class

Lower Class Limit: smallest value in class (e.g., 60 in class 60–70)

Upper Class Limit: largest value in class (e.g., 70 in class 60–70)

**Class Interval (Class Width):**

Difference between upper and lower class limits

Formula: **Class Interval = Upper Class Limit − Lower Class Limit**

For class 60–70: Class Interval = 70 − 60 = 10

**Class Mark (Class Mid-Point):**

Middle value of a class, representing entire class in calculations

Formula: **Class Mark = (Upper Class Limit + Lower Class Limit) / 2**

For class 60–70: Class Mark = (60 + 70) / 2 = 65

Used for all calculations after data is grouped (individual values not used)

**Table Illustrating Components:**

|-------|-----------|------------|------------|-----------|

| 0–10 | 1 | 0 | 10 | 5 |

| 10–20 | 8 | 10 | 20 | 15 |

| 40–50 | 21 | 40 | 50 | 45 |

| 50–60 | 23 | 50 | 60 | 55 |

| 60–70 | 19 | 60 | 70 | 65 |

---

FREQUENCY CURVE

**Frequency Curve Definition:** Graphic/diagrammatic representation of frequency distribution showing relationship between class values and their frequencies.

**Construction:**

X-axis: Class marks (mid-values)

Y-axis: Frequencies

Plot points at (class mark, frequency) coordinates

Connect points to form curve

**Interpretation:**

Shows concentration of data

Reveals distribution shape (symmetrical, skewed, etc.)

Identifies classes with maximum and minimum frequencies

Example: Maximum frequency 23 at class mark 55 (class 50–60) indicates concentration

Minimum frequency 1 at class mark 5 (class 0–10) indicates few poor performers

**Board Exam Important:** Students must be able to plot and interpret frequency curves from given data.

---

PREPARING FREQUENCY DISTRIBUTION: FIVE CRITICAL DECISIONS

Before constructing frequency distribution, five questions must be systematically addressed:

1. EQUAL VS UNEQUAL CLASS INTERVALS

**Equal-Sized Intervals** (Most Common):

Used when data is relatively uniformly spread

All classes have same width

Easier to compare and analyse

Used in most situations unless specified otherwise

**Unequal-Sized Intervals:**

Used in two situations:

**Situation 1: High Range Variables**

Variables with very large range (e.g., income per day: ₹0 to ₹1,00,00,000)

Equal intervals would create problems:

(i) Moderate equal intervals = too many classes (unwieldy)

(ii) Large equal intervals = loss of information about very low or very high values

Solution: Use smaller intervals for concentration areas, larger for sparse areas

**Situation 2: Concentrated Data**

When many values cluster in narrow range (e.g., most incomes between ₹50,000–₹1,00,000 but few above ₹5,00,000)

Equal intervals would waste classes on empty regions

Solution: Use detailed intervals where data concentrates, broader intervals elsewhere

**Board Exam Important:** Exams may ask students to justify choice between equal and unequal intervals for specific datasets.

---

2. NUMBER OF CLASSES

**Standard Rule:** Number of classes usually between **6 and 15**

**Why These Limits?**

Fewer than 6 classes: too much data compression, loss of information

More than 15 classes: too detailed, defeats purpose of classification, still cumbersome

**Formula for Equal-Sized Intervals:**

**Number of Classes = Range / Class Interval**

Where: **Range = Largest Value − Smallest Value**

**Example Calculation:**

From Table 3.1 (mathematics marks):

Largest value: 100

Smallest value: 0

Range = 100 − 0 = 100

If class interval = 10, then Number of Classes = 100/10 = 10 classes ✓

---

3. CLASS INTERVAL (CLASS WIDTH)

**Relationship with Number of Classes:**

These two decisions are **interlinked**—cannot determine one without the other.

**Decision Process:**

1. Determine Range = Largest − Smallest value

2. Decide approximate number of classes (6–15)

3. Calculate Class Interval = Range / Number of Classes

4. Round to convenient number (usually 5, 10, 20, 50, 100, etc.)

**In Example 4:** Range = 100, desired classes = 10, therefore Class Interval = 100/10 = 10 ✓

**Board Exam Tip:** All classes should have same interval for equal-sized classification. If 9 classes with 10-unit width would exceed range, adjust to ensure all data fits within formed classes.

---

4. DETERMINING CLASS LIMITS

**Principles for Class Limits:**

Must be **definite and clearly stated**

Should not be open-ended (avoid "70 and above" or "less than 10" if possible)

**Lower and upper limits should allow frequencies to concentrate in class middle** (not all data at boundaries)

Must be mutually exclusive (each observation fits exactly one class)

**Two Types of Class Interval Systems:**

**Inclusive Class Intervals:**

Values equal to lower AND upper limits are **both included** in the class

Example: Class 20–29 includes 20, 21, 22, ..., 29 (all included)

Appropriate for: discrete variables, whole numbers

Class boundaries: 20, 21, 22, ..., 29

Width: 10 (from 20 to 29 inclusive)

**Exclusive Class Intervals:**

Value equal to upper limit is **excluded** from class; included in next class

Value equal to lower limit is **included**

Example: 20–30 includes 20, 21, 22, ..., 29.99... but NOT 30

30 is included in next class (30–40)

Appropriate for: continuous variables

Class boundaries are continuous: 20–30, 30–40, 40–50

No gaps or overlaps

**Real-world Application:**

For **Discrete Data (marks in full numbers only):**

```

Inclusive: 0–9, 10–19, 20–29, 30–39, ...

Mark 9 in first class, mark 10 in second class

```

For **Continuous Data (height, weight, income with decimals):**

```

Exclusive: 0–10, 10–20, 20–30, ...

Value 9.5 in first class (0–10)

Value 10.0 exactly in second class (10–20)

```

**Board Exam Critical:** Students must identify variable type and select appropriate interval system. If data shows decimal/fractional values, use exclusive intervals. If whole numbers only, can use either but inclusive is simpler.

---

TALLY MARKING METHOD

**Purpose:** Systematic and efficient method to count frequency of each class from raw data without missing or double-counting observations.

**Steps:**

1. **Prepare Class Structure:** List all classes in order

2. **Go Through Raw Data Sequentially:** Examine each observation once

3. **Place Tally Marks:** For each observation, place one mark in corresponding class

Single marks: |

Five marks: |||| (four vertical plus one diagonal)

Continue: |||| |||| (10), |||| |||| || (12), etc.

4. **Count Tallies:** Convert tally groups to frequency numbers

5. **Total Check:** Sum all frequencies = total observations (100 in mathematics example)

**Advantages:**

Prevents observation being counted twice

Prevents observations being missed

Visual representation of concentration

Quick and systematic

**Board Exam Important:** Questions may provide raw data and ask to prepare frequency distribution using tally method.

---

UNIVARIATE AND BIVARIATE FREQUENCY DISTRIBUTIONS

UNIVARIATE FREQUENCY DISTRIBUTION

**Definition:** Frequency distribution of **single variable** showing how observations are distributed across classes of one variable.

**Example:** Frequency distribution of mathematics marks (Table 3.4):

Single variable: Marks in Mathematics

Shows: How 100 students are distributed across different mark ranges

Classes: 0–10, 10–20, ..., 90–100

Frequencies: 1, 8, 6, 7, 21, 23, 19, 6, 5, 4

**Table Structure:**

| Marks | Frequency |

|-------|-----------|

| 0–10 | 1 |

| 10–20 | 8 |

| ... | ... |

---

BIVARIATE FREQUENCY DISTRIBUTION

**Definition:** Frequency distribution showing relationship between **two variables** simultaneously. Shows how observations are distributed across classes of both variables.

**Example:** Distribution of students by gender AND marks:

|-------|------|--------|-------|

| 0–20 | 2 | 7 | 9 |

| 20–40 | 5 | 8 | 13 |

| 40–60 | 20 | 24 | 44 |

| 60–80 | 18 | 10 | 28 |

| 80–100 | 6 | 4 | 10 |

| Total | 51 | 53 | 104 |

**Information from Bivariate Distribution:**

Of 51 males: 2 scored 0–20, 5 scored 20–40, 20 scored 40–60, etc.

Of 53 females: 7 scored 0–20, 8 scored 20–40, etc.

Shows relationship between gender and academic performance

Reveals gender-based performance differences

**Advantages:**

Simultaneously analyses two variables

Identifies correlation or association between variables

Useful for policy analysis (e.g., examining gender disparities in education)

**Board Exam Important:** Indian economic data often requires bivariate analysis—employment by gender and sector, income distribution by caste and region, etc.

---

RELATIVE FREQUENCY AND PERCENTAGE FREQUENCY

**Definition:** Expressing class frequency as **proportion or percentage** of total frequency.

**Formulas:**

**Relative Frequency = Class Frequency / Total Frequency**

**Percentage Frequency = (Class Frequency / Total Frequency) × 100**

**Example from Mathematics Distribution:**

|-------|-----------|-------------------|-----------|

| 0–10 | 1 | 1/100 = 0.01 | 1% |

| 10–20 | 8 | 8/100 = 0.08 | 8% |

| 40–50 | 21 | 21/100 = 0.21 | 21% |

| 50–60 | 23 | 23/100 = 0.23 | 23% |

| Total | 100 | 1.00 | 100% |

**Interpretation:**

1% of students scored 0–10 (poorest performers)

23% concentrated in 50–60 (largest group)

44% scored above 50 (above average)

9% scored 0–20 (very poor)

**Board Exam Important:** Expressing data as percentages enables comparison between datasets of different sizes and is essential for policy analysis and data interpretation questions.

---

KEY EXAM PRACTICE POINTS

1. **Variable Identification:** Before classification, identify whether data is discrete or continuous, qualitative or quantitative.

2. **Frequency Distribution Construction:** Follow systematic 5-step decision process regarding intervals, limits, and frequency counting.

3. **Class Interval Selection:** Use exclusive intervals for continuous data, inclusive for discrete.

4. **Range Calculation:** Always calculate Range = Max − Min to determine appropriate class size.

5. **Tally Method Application:** Practice systematic tally marking to avoid errors in frequency counting.

6. **Data Interpretation:** Express findings as percentages; identify maximum and minimum frequencies; discuss distribution patterns.

7. **Indian Context:** Apply classification concepts to Census data, income distribution, regional disparities, employment statistics—key for board examination scenario-based questions.

This chapter forms the foundation for all subsequent statistical analysis and is essential for both quantitative problem-solving and data interpretation in board examinations.

MCQs — 10 Questions with Answers

Q1. Raw data in Table 3.1 showing marks of 100 students is difficult to analyse because:

A. The marks are presented in ascending order
B. The data is large, unorganised, and does not show any pattern easily ✓
C. The data contains only numerical values
D. The number of students is exactly 100

Answer: B — Raw data is unclassified and disorganised, making it tedious to draw meaningful conclusions without systematic classification and grouping.

Q2. What is the primary purpose of classifying data in the context of the Census 2001?

A. To reduce the number of observations
B. To arrange 20 crore observations by gender, education, occupation so meaningful insights become visible ✓
C. To collect data only from urban areas
D. To ignore data that does not fit into predetermined classes

Answer: B — Classification of massive Census data by variables like gender and occupation converts fragmented raw data into understandable population structure.

Q3. Which statement about classification of data is CORRECT?

A. Classification can be done in arbitrary manner without any criteria
B. Classification must group similar characteristics while excluding items that do not fit the criteria ✓
C. Classification is done only for numerical data, never for qualitative data
D. Classification reduces the total number of observations in the dataset

Answer: B — Classification follows logical rules—items with similar characteristics go together, and unrelated items are excluded from that group.

Q4. In a frequency distribution table for marks of students, what does the 'frequency' column represent?

A. The average marks obtained by students
B. The highest marks in each class interval
C. The number of students whose marks fall within each class interval ✓
D. The difference between the highest and lowest marks

Answer: C — Frequency in a class is the count of how many observations belong to that specific class or group.

Q5. The kabadiwallah classifies junk into newspapers, glass, metals, and plastics. This is an example of:

A. Chronological classification (organised by time)
B. Quantitative classification (organised by numbers)
C. Qualitative classification (organised by type or quality) ✓
D. Bivariate classification (comparing two variables)

Answer: C — The kabadiwallah groups items by their type or material (newspaper, glass, metal, plastic), which is qualitative classification.

Q6. Which of the following is NOT a requirement for a good classification?

A. Classes must be mutually exclusive (no overlap)
B. All data must be included in some class (exhaustive)
C. Classes must be arranged in alphabetical order always ✓
D. Classification must follow a single clear principle

Answer: C — Alphabetical order is not mandatory—classification can be by time, quantity, type, or any logical principle; what matters is clarity and consistency.

Q7. Monthly household expenditure data of 50 families (Table 3.2) is difficult to analyse because it:

A. Contains only even numbers
B. Is presented as a raw, unorganised list where average expenditure cannot be found easily ✓
C. Has values that are all above ₹1000
D. Shows different currency units for different families

Answer: B — Raw unclassified expenditure data in Table 3.2 is jumbled; without grouping into class intervals, calculating average or finding patterns is tedious.

Q8. Assertion: Chronological classification arranges data by time periods. Reason: Raw data must always be classified by time before any statistical analysis. Which is true?

A. Both Assertion and Reason are correct, and Reason explains Assertion
B. Both Assertion and Reason are correct, but Reason does not explain Assertion
C. Assertion is correct, but Reason is incorrect—data can be classified by type, time, or quantity depending on purpose ✓
D. Both Assertion and Reason are incorrect

Answer: C — Chronological classification (arranging by year/month/week) is one method, but data can also be classified qualitatively or quantitatively based on research purpose.

Q9. A researcher collects data on students' marks (ranging 0–100) for 200 students. If she forms class intervals of 0–10, 10–20, 20–30, etc., what is the primary advantage of this classification?

A. It increases the total number of observations in the dataset
B. It converts 200 unorganised individual values into 10 manageable groups showing frequency distribution ✓
C. It removes all marks below 50 from the dataset
D. It ensures every student has the same marks in their class interval

Answer: B — Class intervals group 200 scattered marks into 10 ranges, making it easy to see how many students fall in each interval and identify patterns.

Q10. A bivariate frequency distribution differs from a univariate distribution in that it:

A. Uses a larger sample size of data
B. Studies the relationship between two variables instead of one variable only ✓
C. Is only used for Census data and never for other surveys
D. Always produces a bell-shaped curve in the frequency table

Answer: B — Bivariate distribution examines two variables together (e.g., marks AND attendance), while univariate examines only one variable (e.g., marks alone).

Flashcards

What is raw data?

Raw data is unclassified, unorganised data collected directly from surveys or observations before any statistical processing.

Define classification of data.

Classification is arranging or organising data into groups or classes based on some logical criteria so similar items are grouped together.

What is the purpose of classifying raw data?

Classification brings order to raw data, making it easy to apply statistical methods, draw comparisons, and reach conclusions.

What is a frequency distribution table?

A frequency distribution table shows how many observations fall into each class or group, displaying the pattern of data.

What are tally marks?

Tally marks are a manual counting method using small marks (often in groups of 5) to record how many observations belong to each class.

What is Chronological Classification?

Chronological Classification arranges data in order of time (by years, months, weeks, or days) to show trends over a period.

Differentiate between univariate and bivariate frequency distribution.

Univariate distribution studies one variable only, while bivariate distribution studies two variables and their relationship together.

Why is proper organisation of Census data important?

Census data on 20 crore people is so large that without classification by gender, education, occupation, etc., no meaningful conclusions can be drawn.

What criteria must a good classification follow?

A good classification must be exhaustive (cover all data), mutually exclusive (no overlap between classes), and based on a single clear principle.

How does the kabadiwallah example relate to data classification?

Just as the kabadiwallah groups junk by type (glass, metals, newspapers) to find items easily, data classification groups similar observations so patterns become clear.

Important Board Questions

Define raw data and explain why classification of raw data is necessary before statistical analysis. [2 marks]

State that raw data is unclassified and unorganised observations from surveys/census. Explain that large raw datasets are tedious to analyse directly and hide meaningful patterns until classified into groups.

With reference to Table 3.1 (100 students' mathematics marks), explain the process of converting raw data into a frequency distribution table. What role do tally marks play in this process? [5 marks]

Step 1: Identify the range (lowest to highest marks). Step 2: Form class intervals (e.g., 0–10, 10–20). Step 3: Use tally marks to count how many students fall in each class. Step 4: Record frequencies. Explain that tally marks (||||) in groups of 5 make manual counting quick and error-free.

Why did the Government of India need to classify Census 2001 data (20 crore observations) by gender, education, marital status, and occupation? Explain how classification transforms raw Census data into useful information for policy planning. [6 marks]

Show that 20 crore unclassified observations are impossible to understand. Classification reveals population structure: gender ratios (sex-ratio), education levels (literacy), employment patterns (occupation distribution). This enables government to design targeted policies for education, healthcare, employment, and social welfare. Use examples: if female literacy is low, education schemes can be designed; if unemployment is high in a region, industry can be developed there.

Next chapterPresentation of Data →

Practice with interactive flashcards, mind maps, upload your own chapters and get AI study kits instantly

Try StudyOS Free →