
What Will You Learn?
By the end of this lesson, you will be able to:
- Distinguish between qualitative and quantitative data
- Identify different subtypes of data (nominal, ordinal, discrete, continuous)
- Understand how to acquire data from various sources
- Learn methods to process and clean raw data
- Interpret data correctly for AI applications
Imagine you’re conducting a survey about your school’s canteen. You might ask questions like:
- “How would you rate the food quality?” (Excellent, Good, Average, Poor)
- “What’s your favorite dish?” (Samosa, Sandwich, Biryani, Pasta)
- “How many times do you eat at the canteen per week?” (1, 2, 3, 4, 5)
- “How much do you spend on average?” (₹20, ₹35, ₹50, ₹75)
Notice that some answers are categories (words), while others are numbers. Some numbers can only be whole values, while others could be decimals. These differences matter a lot in AI.
Understanding types of data is like understanding different ingredients in cooking. You wouldn’t treat salt the same as flour, right, even though both are white? Similarly, AI treats different types of data differently.
Let’s explore this fascinating world of data types.
The Two Main Types of Data
All data falls into one of two broad categories:
| Type | Also Called | What It Is | Example |
|---|---|---|---|
| Qualitative | Categorical | Describes qualities or categories of the data | Colors, names, opinions |
| Quantitative | Numerical | Describes quantities with numbers | Age, height, temperature |
Think of it this way:
- Qualitative answers “What kind?” or “Which category?”
- Quantitative answers “How many?” or “How much?”
💡 Key Insight
The type of data determines what analysis methods you can use. You can calculate the average height (quantitative), but you can’t calculate the “average color” (qualitative).
Qualitative Data: Categories and Qualities
Qualitative data describes characteristics that cannot be measured with numbers. It categorizes or labels items.
Qualitative data describes characteristics that cannot be measured with numbers. It focuses on attributes, perceptions, and descriptive qualities that help you understand the nature of what you’re studying. This type of data often captures opinions, behaviors, or categories that add context to numerical findings. It is especially useful when you want to explore motivations, patterns, or meanings behind an outcome.
Types of Qualitative Data
1. Nominal Data
What it is: Categories with no natural order or ranking.
Characteristics:
- Just labels or names
- No category is “higher” or “better” than another
- Cannot be arranged in meaningful order
Examples:
| Category | Possible Values |
|---|---|
| Gender | Male, Female, Other |
| Blood type | A, B, AB, O |
| Favorite color | Red, Blue, Green, Yellow |
| City of residence | Delhi, Mumbai, Chennai, Kolkata |
| Programming language | Python, Java, C++, JavaScript |
What you CAN do: Count occurrences, find the mode (most common)
What you CANNOT do: Calculate mean, find median, arrange in order
2. Ordinal Data
What it is: Categories that have a natural order or ranking.
Characteristics:
- Categories can be ranked (1st, 2nd, 3rd…)
- The gaps between ranks may not be equal
- Order matters, but differences can’t be measured precisely
Examples:
| Category | Ordered Values |
|---|---|
| Education level | Primary < Secondary < Graduate < Postgraduate |
| Customer satisfaction | Very Dissatisfied < Dissatisfied < Neutral < Satisfied < Very Satisfied |
| T-shirt size | XS < S < M < L < XL < XXL |
| Star rating | ⭐ < ⭐⭐ < ⭐⭐⭐ < ⭐⭐⭐⭐ < ⭐⭐⭐⭐⭐ |
| Spice level | Mild < Medium < Hot < Extra Hot |
What you CAN do: Count, find mode, find median, compare rankings
What you CANNOT do: Calculate mean (the “average” of Good and Excellent isn’t meaningful)
🧪 Think About It
Is the difference between “Satisfied” and “Very Satisfied” the same as between “Neutral” and “Satisfied”? We don’t know — that’s why ordinal data is tricky!
Quantitative Data: Numbers and Measurements
Quantitative data represents quantities that can be measured and expressed as numbers.
Types of Quantitative Data
1. Discrete Data
What it is: Countable values, usually whole numbers.
Characteristics:
- Can only take specific, separate values
- Usually obtained by counting
- Cannot have fractions or decimals (in context)
- There are gaps between possible values
Examples:
| Measurement | Why It’s Discrete |
|---|---|
| Number of students in class | Can’t have 32.5 students |
| Number of cars in parking | Can’t have 2.7 cars |
| Goals scored in a match | Can’t score 3.5 goals |
| Number of siblings | Can’t have 1.5 siblings |
| Eggs in a basket | Can’t have half an egg (in counting) |
What you CAN do: Count, mean, median, mode, range, all mathematical operations
2. Continuous Data
What it is: Measurable values that can take any value within a range.
Characteristics:
- Can take any value (including decimals)
- Usually obtained by measuring
- Infinite possible values between any two points
- No gaps between possible values
Examples:
| Measurement | Why It’s Continuous |
|---|---|
| Height | Can be 165.7 cm, 165.73 cm, etc. |
| Weight | Can be 58.5 kg, 58.52 kg, etc. |
| Temperature | Can be 28.3°C, 28.35°C, etc. |
| Time taken | Can be 45.7 seconds |
| Distance | Can be 5.25 km |
What you CAN do: All mathematical operations, precise measurements, detailed analysis
Quick Comparison: All Four Types
| Type | Category | Order | Math Operations | Example |
|---|---|---|---|---|
| Nominal | Qualitative | No order | Count, mode only | Blood type (A, B, O) |
| Ordinal | Qualitative | Has order | Count, mode, median | Rating (Poor to Excellent) |
| Discrete | Quantitative | Has order | All operations | Number of children |
| Continuous | Quantitative | Has order | All operations | Height in cm |
Visual Summary
DATA
│
┌───────────┴───────────┐
│ │
QUALITATIVE QUANTITATIVE
(Categories) (Numbers)
│ │
┌────┴────┐ ┌────┴────┐
│ │ │ │
NOMINAL ORDINAL DISCRETE CONTINUOUS
(No order) (Ordered) (Counted) (Measured)
Identifying Data Types: Practice
Let’s practice identifying data types:
| Data Example | Type | Reasoning |
|---|---|---|
| PIN code: 110001 | Nominal | Numbers used as labels, not for math |
| Rank in class: 1st, 2nd, 3rd | Ordinal | Ordered categories |
| Marks obtained: 78, 85, 92 | Discrete | Countable, whole numbers |
| Body temperature: 98.6°F | Continuous | Measurable, can have decimals |
| Movie genre: Action, Comedy | Nominal | Categories, no order |
| Pain level: 1-10 scale | Ordinal | Ordered rating scale |
| Number of pages in book | Discrete | Countable |
| Time to complete task | Continuous | Measurable duration |
⚠️ Common Confusion
Just because something is a number doesn’t make it quantitative! Phone numbers, PIN codes, and jersey numbers are actually nominal — they’re labels, not quantities. You wouldn’t calculate the “average phone number”!
Data Acquisition: Gathering Your Data
Now that we understand data types, let’s learn how to acquire (collect) data.
Primary Data Collection Methods
| Method | Description | Best For | Data Types |
|---|---|---|---|
| Surveys/Questionnaires | Asking people questions | Opinions, preferences | All types |
| Observations | Watching and recording | Behavior, events | All types |
| Experiments | Controlled testing | Cause-effect relationships | Quantitative |
| Interviews | In-depth conversations | Detailed insights | Mostly qualitative |
| Sensors/Devices | Automatic measurement | Physical measurements | Continuous |
Secondary Data Sources
| Source | Examples | Considerations |
|---|---|---|
| Government portals | data.gov.in, census data | Reliable, may be outdated |
| Research databases | Kaggle, UCI Repository | Clean, documented |
| Company records | Sales data, HR records | Need permission |
| Published reports | Industry reports, studies | May have bias |
| Web scraping | Social media, websites | Legal and ethical concerns |
Designing Good Survey Questions
| Question Type | Example | Data Type Generated |
|---|---|---|
| Multiple choice (single) | “What is your gender?” | Nominal |
| Multiple choice (ranked) | “Rate your satisfaction (1-5)” | Ordinal |
| Numeric input | “How old are you?” | Discrete |
| Scale/slider | “Rate your pain level” | Ordinal/Continuous |
| Open-ended | “Describe your experience” | Qualitative (text) |
Tips for good questions:
- Be clear and specific
- Avoid leading questions
- Provide appropriate options
- Consider the data type you need
The Three Strategies for Data Acquisition
Beyond knowing where to collect data (primary or secondary), it is equally important to understand how data acquisition happens in practice. The CBSE AI curriculum identifies three distinct strategies:
1. Discovery
Discovery means finding and using data that already exists. You search for datasets that have already been collected — in government portals, research repositories, company databases, or published studies — and use them directly for your AI project.
Example: An AI project to predict crop yield could use rainfall and temperature data already published by the Indian Meteorological Department, rather than setting up new weather stations.
When to use it: When sufficient data already exists and collecting new data would be expensive or time-consuming.
2. Augmentation
Augmentation means enhancing an existing dataset to make it larger, more balanced, or more representative. You take data that already exists and systematically expand it.
Example: If you only have 200 photos of diseased leaves for training an AI plant disease detector, you can augment the dataset by rotating, flipping, and slightly adjusting the brightness of each image — creating 2,000 training examples from the original 200.
When to use it: When your existing data is too small, unbalanced, or lacks variety — and collecting entirely new data is not feasible.
3. Generation
Generation means creating entirely new, artificial (synthetic) data that did not previously exist. This is especially useful when real data is scarce, sensitive, or impossible to collect.
Example: Medical AI systems often lack enough examples of rare diseases. Researchers can generate synthetic patient records that mimic real data patterns but contain no actual patient information — solving the privacy problem while increasing training data.
When to use it: When real data cannot be obtained due to privacy concerns, cost, rarity of events, or ethical restrictions.
| Strategy | What You Do | Example |
|---|---|---|
| Discovery | Find and use existing data | Government agricultural databases |
| Augmentation | Expand and enhance existing data | Image flipping and rotation |
| Generation | Create synthetic data from scratch | Simulated patient records |
Data Processing: Cleaning and Preparing
Raw data is rarely ready for use. Data processing transforms raw data into usable format.
Common Data Problems
| Problem | Example | Solution |
|---|---|---|
| Missing values | Age: , 25, 30, , 28 | Fill with average, remove, or flag |
| Inconsistent formats | “Male”, “M”, “male”, “MALE” | Standardize to one format |
| Outliers | Heights: 165, 170, 168, 950, 172 | Investigate: error or genuine? |
| Duplicates | Same person entered twice | Remove duplicates |
| Wrong data types | Age stored as “Twenty-five” | Convert to number (25) |
| Invalid values | Age: -5 or 500 | Validate against possible range |
Data Processing Steps
Step 1: COLLECT raw data
↓
Step 2: INSPECT for problems
↓
Step 3: CLEAN (fix errors, fill gaps)
↓
Step 4: TRANSFORM (convert formats, create categories)
↓
Step 5: VALIDATE (check everything is correct)
↓
Step 6: STORE in organized format
Example: Processing Survey Data
Raw data collection
| Name | Age | City | Rating |
|---|---|---|---|
| Rahul | 15 | delhi | Good |
| Priya | sixteen | Delhi | EXCELLENT |
| Amit | 14 | Delhi | good |
| Rahul | 15 | delhi | Good |
| Neha | Mumbai | Average |
Problems identified:
- Inconsistent capitalization (delhi/Delhi, Good/good/EXCELLENT)
- Age as text (“sixteen”)
- Missing value (Neha’s age)
- Duplicate entry (Rahul appears twice)
After processing:
| Name | Age | City | Rating |
|---|---|---|---|
| Rahul | 15 | Delhi | Good |
| Priya | 16 | Delhi | Excellent |
| Amit | 14 | Delhi | Good |
| Neha | 15* | Mumbai | Average |
*Filled with average age
Three Qualities of Well-Processed Data: Structure, Cleanliness, and Accuracy
Once data has been collected, processing it is not just about fixing obvious errors. Good data processing ensures three fundamental qualities that make data genuinely ready for AI training.
Structure
Structured data is organised in a consistent, machine-readable format. Every column has one type of value, every row represents one record, and the format is uniform throughout.
Example of unstructured raw response: “My age is fifteen and I live in Mumbai.”
After structuring: Age = 15, City = Mumbai (two separate, typed fields).
AI models cannot learn from disorganised text that mixes different kinds of information. Structure is the foundation.
Cleanliness
Clean data is free from errors, duplicates, missing values, and noise. A single dirty entry can skew the AI’s learning if not corrected.
Signs of unclean data: Missing values, inconsistent spelling (“Delhi” vs “delhi”), impossible values (age = 500), duplicate rows.
Cleanliness is not just about removing errors — it is about ensuring every data point truly represents reality.
Accuracy
Accurate data correctly reflects the real-world facts it is supposed to capture. Data can be structured and clean but still inaccurate — for example, if survey respondents answered dishonestly, or if sensors were incorrectly calibrated.
Example: A dataset of students’ study hours might be structured and clean, but if students overreported their hours, the data is inaccurate and will produce a misleading model.
| Quality | Question to Ask | What Goes Wrong Without It |
|---|---|---|
| Structure | Is the data organised consistently? | AI cannot parse or learn from it |
| Cleanliness | Is the data free of errors and noise? | Errors bias the model |
| Accuracy | Does the data reflect reality? | Model learns wrong patterns |
Independent and Dependent Features
When preparing data for an AI model, it is essential to understand the two types of features (columns/variables) in your dataset:
Independent features are the input variables — the pieces of information the AI uses to make a prediction. They are independent because they are the starting conditions, not determined by the output.
Dependent features are the output variable — the thing the AI is trying to predict or classify. It is dependent because its value depends on the pattern of the input features.
Example: Predicting Student Performance
| Independent Features (Inputs) | Dependent Feature (Output) |
|---|---|
| Study hours per day | Final exam score |
| Attendance percentage | |
| Assignment submission rate | |
| Previous test scores |
The AI is given the independent features and must learn to predict the dependent feature. Getting this separation right is critical: if you accidentally include the output as one of the inputs, the AI will appear to perform perfectly during training (because you gave it the answer) but will fail in the real world.
Another example — Disease Prediction:
- Independent features: Age, blood pressure, cholesterol, weight, smoking habits
- Dependent feature: Has heart disease (Yes/No)
💡 Key Insight
In machine learning, independent features are also called predictors or input variables, and the dependent feature is also called the label, target, or output variable. Understanding which is which is the first step in setting up any AI model correctly.
Data Interpretation: Making Sense of Data
Data interpretation is extracting meaning from processed data. Just processing data in a
Textual, Tabular, and Graphical Interpretation
Textual interpretation means expressing findings in written sentences and paragraphs. It is best for summary conclusions, narrative explanations, or when the insight is qualitative.
Example: “The survey shows that most students prefer samosas, with biryani as a close second. Students from higher grades tend to spend more per visit.”
Tabular interpretation means organising data into rows and columns for precise comparison. Tables are best when you need to show exact numbers across multiple categories and allow the reader to look up specific values.
Example:
| Dish | Votes | Percentage |
|---|---|---|
| Samosa | 80 | 40% |
| Biryani | 60 | 30% |
| Sandwich | 40 | 20% |
| Pasta | 20 | 10% |
Graphical interpretation means representing data visually — through bar charts, pie charts, line graphs, scatter plots, and other visuals. Graphs are best when you want to show trends, comparisons, or distributions at a glance, without requiring the reader to study numbers closely.
| Form | Best Used For | Limitation |
|---|---|---|
| Textual | Summaries, qualitative insights, narrative conclusions | Hard to compare many values |
| Tabular | Precise comparisons, looking up specific values | Takes effort to spot trends |
| Graphical | Trends, distributions, patterns at a glance | Less precise than a table |
A well-prepared data report typically combines all three: a graph to show the pattern, a table for precise numbers, and text to explain what it means.
Interpreting Different Data Types
| Data Type | Interpretation Methods |
|---|---|
| Nominal | Frequency counts, mode, percentage distribution |
| Ordinal | Median, percentiles, ranking analysis |
| Discrete | Mean, median, mode, frequency distribution |
| Continuous | Mean, median, standard deviation, range |
Example: Interpreting Survey Results
Question: “How satisfied are you with school facilities?”
Results (200 students):
| Rating | Count | Percentage |
|---|---|---|
| Very Dissatisfied | 10 | 5% |
| Dissatisfied | 30 | 15% |
| Neutral | 40 | 20% |
| Satisfied | 80 | 40% |
| Very Satisfied | 40 | 20% |
Interpretation:
- Mode: Satisfied (most common response)
- Median: Satisfied (middle value when ordered)
- Positive responses: 60% (Satisfied + Very Satisfied)
- Negative responses: 20% (Dissatisfied + Very Dissatisfied)
- Insight: Most students are satisfied, but 20% are unhappy — worth investigating why.
Avoiding Interpretation Mistakes
| Mistake | Example | Problem |
|---|---|---|
| Treating ordinal as continuous | “Average satisfaction is 3.7” | Gaps between categories aren’t equal |
| Ignoring sample size | “100% satisfied!” (based on 2 responses) | Too small to be meaningful |
| Confusing correlation with causation | “Ice cream sales and drowning both increase in summer, so ice cream causes drowning” | Both caused by a third factor (heat) |
| Cherry-picking data | Showing only favorable results | Misleading conclusions |
Data Types in AI Applications
Different AI applications need different data types:
| AI Application | Primary Data Types | Example Data |
|---|---|---|
| Image Classification | Nominal (labels) + Continuous (pixels) | “Cat” or “Dog” labels on images |
| Sentiment Analysis | Ordinal (sentiment scores) | Positive/Negative/Neutral ratings |
| Price Prediction | Continuous | House prices, stock prices |
| Customer Segmentation | Mixed | Demographics (nominal) + Spending (continuous) |
| Recommendation Systems | Ordinal (ratings) + Nominal (categories) | Movie ratings, genre preferences |
| Medical Diagnosis | Mixed | Symptoms (nominal), test results (continuous) |
How AI Handles Different Data Types
| Data Type | AI Treatment |
|---|---|
| Nominal | One-hot encoding (converting to binary columns) |
| Ordinal | Label encoding (converting to ordered numbers) |
| Discrete | Direct use or normalization |
| Continuous | Normalization or standardization |
Example: One-Hot Encoding for Colors
Original: Color = [Red, Blue, Green, Red, Blue]
Encoded:
| Is_Red | Is_Blue | Is_Green |
|---|---|---|
| 1 | 0 | 0 |
| 0 | 1 | 0 |
| 0 | 0 | 1 |
| 1 | 0 | 0 |
| 0 | 1 | 0 |
This allows AI to work with categorical data mathematically.
Activity: Classify and Plan
Part A: Data Type Classification
Classify each as Nominal, Ordinal, Discrete, or Continuous:
- Number of WhatsApp messages sent today
- Your blood group
- Temperature in your city
- Your position in a race (1st, 2nd, 3rd)
- Number of pets you have
- Your favorite sport
- Your height
- Customer review stars (1-5)
- Number of Instagram followers
- Your mood today (Happy, Sad, Neutral)
Part B: Data Collection Planning
You want to understand Class 9 students’ study habits. Design a survey with:
- 2 nominal questions
- 2 ordinal questions
- 2 discrete questions
- 1 continuous question
(Answers in Answer Key)
Quick Recap
- Qualitative data describes categories (Nominal: no order; Ordinal: has order).
- Quantitative data describes numbers (Discrete: counted; Continuous: measured).
- Nominal data includes categories like colors, names, and types — no natural order.
- Ordinal data includes rankings and ratings — order matters but gaps aren’t equal.
- Discrete data includes countable values like number of students — whole numbers.
- Continuous data includes measurements like height and temperature — any value possible.
- Data acquisition involves collecting data through surveys, observations, experiments, or secondary sources.
- Data processing cleans and prepares raw data by fixing errors, filling gaps, and standardizing formats.
- Data interpretation extracts meaning using appropriate methods for each data type.
- AI handles different data types differently — nominal needs encoding, continuous needs normalization.
Next Lesson: Data Visualization with Tableau: How to Create Interactive Charts and Dashboards
Previous Lesson: Data Literacy for Beginners: Data Pyramid, Data Privacy and Cyber Security
EXERCISES
A. Fill in the Blanks
- Data that describes categories or qualities is called __________________________ data.
- Data that describes quantities with numbers is called __________________________ data.
- Nominal data has categories with __________________________ natural order.
- Ordinal data has categories that can be ____________________________.
- Discrete data is obtained by __________________________ (counting/measuring).
- Continuous data is obtained by __________________________ (counting/measuring).
- Phone numbers and PIN codes are examples of __________________________ data, not quantitative.
- The process of fixing errors and standardizing formats is called data __________________________.
- Converting categorical data into numerical format for AI is called __________________________.
- ____________________________.gov.in is India’s official open data portal.
B. Multiple Choice Questions
1. Which is an example of qualitative data?
(a) Height: 165 cm
(b) Age: 15 years
(c) Favorite color: Blue
(d) Temperature: 28°C
2. Ordinal data differs from nominal data because:
(a) It uses numbers
(b) It has a natural order
(c) It can be measured
(d) It has no categories
3. “Number of students in a class” is what type of data?
(a) Nominal
(b) Ordinal
(c) Discrete
(d) Continuous
4. Body temperature (98.6°F) is an example of:
(a) Nominal data
(b) Ordinal data
(c) Discrete data
(d) Continuous data
5. Which operation is NOT valid for nominal data?
(a) Counting occurrences
(b) Finding the mode
(c) Calculating the mean
(d) Finding percentages
6. A customer satisfaction rating of “Excellent, Good, Average, Poor” is:
(a) Nominal
(b) Ordinal
(c) Discrete
(d) Continuous
7. Which is a primary data collection method?
(a) Using government databases
(b) Conducting surveys
(c) Downloading from Kaggle
(d) Reading research reports
8. One-hot encoding is used for:
(a) Continuous data
(b) Discrete data
(c) Nominal data
(d) Ordinal data
9. “Age: Twenty-five” instead of “25” is an example of:
(a) Missing value
(b) Wrong data type
(c) Duplicate entry
(d) Outlier
10. Jersey numbers on sports uniforms are:
(a) Nominal data
(b) Ordinal data
(c) Discrete data
(d) Continuous data
C. True or False
- Qualitative data can always be measured with numbers. (__)
- Ordinal data has categories that can be ranked. (__)
- You can calculate the average of nominal data. (__)
- Discrete data can have decimal values. (__)
- Height and weight are examples of continuous data. (__)
- PIN codes are quantitative data because they contain numbers. (__)
- Data processing includes cleaning and standardizing data. (__)
- The mode can be found for all types of data. (__)
- Surveys can collect both qualitative and quantitative data. (__)
- Correlation always means causation. (__)
D. Define the Following (30-40 words each)
- Qualitative Data
- Quantitative Data
- Nominal Data
- Ordinal Data
- Discrete Data
- Continuous Data
- Data Processing
E. Very Short Answer Questions (40-50 words each)
- What is the main difference between qualitative and quantitative data?
- Explain the difference between nominal and ordinal data with examples.
- How is discrete data different from continuous data?
- Why are phone numbers considered nominal data even though they contain digits?
- What are three common problems found in raw data?
- Name three primary methods of data collection.
- What is one-hot encoding and why is it used?
- Why can’t you calculate the mean of ordinal data?
- Give two examples each of discrete and continuous data.
- What should you check when interpreting data to avoid mistakes?
F. Long Answer Questions (75-100 words each)
- Explain the four types of data (nominal, ordinal, discrete, continuous) with two examples each.
- You’re collecting data about students’ mobile phone usage. What type of data would each of the following generate: (a) Brand of phone, (b) Hours of daily usage, (c) Number of apps installed, (d) Satisfaction rating?
- Describe the steps involved in data processing. Why is each step important?
- What are the different methods of data acquisition? Compare primary and secondary data sources.
- Explain how AI handles different types of data. Why does nominal data need special treatment?
- A survey about canteen food collected these responses for “food quality”: Excellent, Good, Good, Average, Excellent, Poor, Good. Analyze this data appropriately.
- Design a data collection plan to understand exercise habits of Class 9 students. Include questions that generate all four data types.
ANSWER KEY
A. Fill in the Blanks – Answers
- qualitative — Qualitative data describes categories.
- quantitative — Quantitative data describes numbers.
- no — Nominal categories have no natural order.
- ranked — Ordinal categories can be ordered.
- counting — Discrete data is counted.
- measuring — Continuous data is measured.
- nominal — Phone numbers are labels, not quantities.
- processing/cleaning — Processing fixes data issues.
- encoding — Encoding converts categories to numbers.
- data — data.gov.in is India’s open data portal.
B. Multiple Choice Questions – Answers
- (c) Favorite color: Blue — Colors are categories, not numbers.
- (b) It has a natural order — Ordinal data can be ranked.
- (c) Discrete — Students are counted as whole numbers.
- (d) Continuous data — Temperature can have any decimal value.
- (c) Calculating the mean — Can’t average categories.
- (b) Ordinal — Ratings have a natural order.
- (b) Conducting surveys — Primary = collecting yourself.
- (c) Nominal data — Converts categories to binary columns.
- (b) Wrong data type — Text instead of number.
- (a) Nominal data — Jersey numbers are labels, not quantities.
C. True or False – Answers
- False — Qualitative data describes qualities, not measured numbers.
- True — Ordinal categories have a natural ranking.
- False — Cannot calculate average of categories like colors.
- False — Discrete data is whole numbers only.
- True — Both can have any decimal value.
- False — PIN codes are labels (nominal), not quantities.
- True — Processing includes cleaning and standardizing.
- True — Mode (most common) works for all types.
- True — Surveys can include various question types.
- False — Correlation does not imply causation.
D. Definitions – Answers
1. Qualitative Data: Data that describes qualities or characteristics using categories rather than numbers. It answers “what kind?” and includes types like colors, names, and opinions.
2. Quantitative Data: Data that describes quantities using numbers and measurements. It answers “how many?” or “how much?” and includes values like height, age, and count.
3. Nominal Data: A type of qualitative data where categories have no natural order or ranking. Examples include blood type, gender, and favorite color.
4. Ordinal Data: A type of qualitative data where categories have a natural order but the gaps between them aren’t necessarily equal. Examples include satisfaction ratings and education levels.
5. Discrete Data: A type of quantitative data with countable, separate values, usually whole numbers. Cannot have fractions in context. Examples: number of children, goals scored.
6. Continuous Data: A type of quantitative data that can take any value within a range, including decimals. Obtained by measuring. Examples: height, weight, temperature.
7. Data Processing: The steps of transforming raw data into usable format, including cleaning errors, handling missing values, standardizing formats, removing duplicates, and validating accuracy.
E. Very Short Answer Questions – Answers
1. Qualitative vs quantitative difference:
Qualitative data describes categories or qualities (colors, names, opinions) answering “what kind?” Quantitative data describes quantities with numbers (height, age, count) answering “how many?” or “how much?”
2. Nominal vs ordinal with examples:
Nominal: categories with no order (blood types A, B, O — no type is “higher”). Ordinal: categories with order (education levels: Primary < Secondary < Graduate — there’s a ranking but gaps aren’t equal).
3. Discrete vs continuous difference:
Discrete data is countable with separate whole values (students in class: 32, not 32.5). Continuous data is measurable with any value possible (height: 165.7 cm, can be any decimal).
4. Phone numbers as nominal:
Phone numbers use digits but are labels/identifiers, not quantities. You wouldn’t add phone numbers or calculate their average. The digits don’t represent amounts — they’re just identification codes.
5. Three raw data problems:
Missing values (empty cells), inconsistent formats (Male/M/male), and outliers (height: 950 cm — likely error). Others include duplicates and wrong data types.
6. Three primary collection methods:
Surveys/questionnaires (asking questions), observations (watching and recording), and experiments (controlled testing). Also interviews and sensor measurements.
7. One-hot encoding:
Converting nominal categories into binary columns. Example: Color (Red, Blue) becomes Is_Red (1/0) and Is_Blue (1/0). Used because AI algorithms need numerical inputs to perform calculations.
8. Why no mean for ordinal:
The gaps between ordinal categories aren’t equal. The difference between “Good” and “Excellent” may not equal the difference between “Poor” and “Average.” Mean assumes equal spacing.
9. Discrete and continuous examples:
Discrete: Number of siblings (0, 1, 2…), goals in a match (0, 1, 2…). Continuous: Height (165.5 cm), weight (58.3 kg), temperature (28.7°C).
10. Avoiding interpretation mistakes:
Check sample size (is it large enough?), don’t confuse correlation with causation, use appropriate methods for each data type, don’t cherry-pick favorable results, consider context.
F. Long Answer Questions – Answers
1. Four data types with examples:
Nominal: Categories without order — Blood type (A, B, O, AB), Favorite color (Red, Blue, Green). Ordinal: Ranked categories — Education (Primary < Secondary < Graduate), Star rating (1-5 stars). Discrete: Countable whole numbers — Number of siblings (0, 1, 2), Books read this year (5, 10, 15). Continuous: Measurable any-value — Height (165.7 cm), Temperature (28.5°C). Each type requires different analysis methods.
2. Mobile phone usage data types:
(a) Brand of phone: Nominal — categories like Apple, Samsung, OnePlus with no natural order. (b) Hours of daily usage: Continuous — can be 2.5 hours, 3.7 hours, any decimal value. (c) Number of apps installed: Discrete — whole numbers only (25, 30, 45 apps). (d) Satisfaction rating: Ordinal — ranked categories (Very Satisfied > Satisfied > Neutral, etc.).
3. Data processing steps:
Collect: Gather raw data from sources. Inspect: Check for problems (missing values, errors, duplicates). Clean: Fix errors, fill gaps, remove duplicates. Transform: Standardize formats, convert types, create categories. Validate: Verify everything is correct and consistent. Store: Organize in proper format. Each step is important because errors at any stage corrupt final analysis.
4. Data acquisition methods:
Primary sources: Collecting yourself through surveys, observations, experiments, interviews — tailored to your needs but time-consuming. Secondary sources: Using existing data from government portals, research databases, company records — saves time but may not perfectly fit needs. Primary gives control over quality; secondary provides larger datasets quickly.
5. AI and data types:
AI algorithms work with numbers, so categorical data needs conversion. Nominal data uses one-hot encoding (color → Is_Red, Is_Blue columns with 0/1). Ordinal data uses label encoding (Poor=1, Average=2, Good=3). Continuous data often needs normalization (scaling to 0-1 range). Discrete data may be used directly or normalized. Without proper handling, AI can’t process categories mathematically.
6. Canteen survey analysis:
Data: Excellent, Good, Good, Average, Excellent, Poor, Good (n=7, ordinal data). Frequency: Excellent-2, Good-3, Average-1, Poor-1. Mode: Good (most common). Median: Good (middle value when ordered). Percentage: Positive (Excellent+Good): 71%, Negative (Poor): 14%. Interpretation: Most students find food quality acceptable, but one unhappy customer worth investigating. Cannot calculate mean — ordinal gaps aren’t equal.
7. Exercise habits data collection:
Nominal questions: “What type of exercise do you prefer?” (Running, Swimming, Gym, Yoga), “Where do you exercise?” (Home, Park, Gym, School). Ordinal questions: “How would you rate your fitness level?” (Poor to Excellent), “How motivated are you to exercise?” (1-5 scale). Discrete questions: “How many days per week do you exercise?” (0-7), “How many push-ups can you do?” (number). Continuous question: “How many minutes do you exercise per session?” (can be 25.5 minutes).
Activity Answers
Part A: Data Type Classification
- Number of WhatsApp messages — Discrete (countable whole numbers)
- Blood group — Nominal (categories, no order)
- Temperature — Continuous (measurable, decimals possible)
- Position in race — Ordinal (ranked categories)
- Number of pets — Discrete (countable whole numbers)
- Favorite sport — Nominal (categories, no order)
- Height — Continuous (measurable, decimals possible)
- Customer review stars — Ordinal (ranked scale)
- Instagram followers — Discrete (countable whole numbers)
- Mood today — Nominal (categories, no inherent order) or Ordinal (if treated as scale)
Part B: Survey Design (Sample)
Nominal questions:
- “What is your preferred study location?” (Home, Library, Classroom, Café)
- “Which subject do you find most interesting?” (Math, Science, English, Social Studies)
Ordinal questions:
- “How would you rate your study habits?” (Excellent, Good, Average, Poor)
- “How stressed do you feel about exams?” (Not at all, Slightly, Moderately, Very, Extremely)
Discrete questions:
- “How many hours do you study on weekdays?” (1, 2, 3, 4, 5+)
- “How many subjects do you need extra help with?” (0, 1, 2, 3, 4+)
Continuous question:
- “On average, how many minutes do you spend on homework daily?” (Open numeric response)
Next Lesson: Data Visualization with Tableau: How to Create Interactive Charts and Dashboards
Previous Lesson: Data Literacy for Beginners: Data Pyramid, Data Privacy and Cyber Security
