What Will You Learn?

By the end of this lesson, you will be able to:

  • Distinguish between qualitative and quantitative data
  • Identify different subtypes of data (nominal, ordinal, discrete, continuous)
  • Understand how to acquire data from various sources
  • Learn methods to process and clean raw data
  • Interpret data correctly for AI applications

Imagine you’re conducting a survey about your school’s canteen. You might ask questions like:

  • “How would you rate the food quality?” (Excellent, Good, Average, Poor)
  • “What’s your favorite dish?” (Samosa, Sandwich, Biryani, Pasta)
  • “How many times do you eat at the canteen per week?” (1, 2, 3, 4, 5)
  • “How much do you spend on average?” (₹20, ₹35, ₹50, ₹75)

Notice that some answers are categories (words), while others are numbers. Some numbers can only be whole values, while others could be decimals. These differences matter a lot in AI.

Understanding types of data is like understanding different ingredients in cooking. You wouldn’t treat salt the same as flour, right, even though both are white? Similarly, AI treats different types of data differently.

Let’s explore this fascinating world of data types.


The Two Main Types of Data

All data falls into one of two broad categories:

TypeAlso CalledWhat It IsExample
QualitativeCategoricalDescribes qualities or categories of the dataColors, names, opinions
QuantitativeNumericalDescribes quantities with numbersAge, height, temperature

Think of it this way:

  • Qualitative answers “What kind?” or “Which category?”
  • Quantitative answers “How many?” or “How much?”

💡 Key Insight

The type of data determines what analysis methods you can use. You can calculate the average height (quantitative), but you can’t calculate the “average color” (qualitative).


Qualitative Data: Categories and Qualities

Qualitative data describes characteristics that cannot be measured with numbers. It categorizes or labels items.

Qualitative data describes characteristics that cannot be measured with numbers. It focuses on attributes, perceptions, and descriptive qualities that help you understand the nature of what you’re studying. This type of data often captures opinions, behaviors, or categories that add context to numerical findings. It is especially useful when you want to explore motivations, patterns, or meanings behind an outcome.

Types of Qualitative Data

1. Nominal Data

What it is: Categories with no natural order or ranking.

Characteristics:

  • Just labels or names
  • No category is “higher” or “better” than another
  • Cannot be arranged in meaningful order

Examples:

CategoryPossible Values
GenderMale, Female, Other
Blood typeA, B, AB, O
Favorite colorRed, Blue, Green, Yellow
City of residenceDelhi, Mumbai, Chennai, Kolkata
Programming languagePython, Java, C++, JavaScript

What you CAN do: Count occurrences, find the mode (most common)
What you CANNOT do: Calculate mean, find median, arrange in order

2. Ordinal Data

What it is: Categories that have a natural order or ranking.

Characteristics:

  • Categories can be ranked (1st, 2nd, 3rd…)
  • The gaps between ranks may not be equal
  • Order matters, but differences can’t be measured precisely

Examples:

CategoryOrdered Values
Education levelPrimary < Secondary < Graduate < Postgraduate
Customer satisfactionVery Dissatisfied < Dissatisfied < Neutral < Satisfied < Very Satisfied
T-shirt sizeXS < S < M < L < XL < XXL
Star rating⭐ < ⭐⭐ < ⭐⭐⭐ < ⭐⭐⭐⭐ < ⭐⭐⭐⭐⭐
Spice levelMild < Medium < Hot < Extra Hot

What you CAN do: Count, find mode, find median, compare rankings
What you CANNOT do: Calculate mean (the “average” of Good and Excellent isn’t meaningful)

🧪 Think About It

Is the difference between “Satisfied” and “Very Satisfied” the same as between “Neutral” and “Satisfied”? We don’t know — that’s why ordinal data is tricky!


Quantitative Data: Numbers and Measurements

Quantitative data represents quantities that can be measured and expressed as numbers.

Types of Quantitative Data

1. Discrete Data

What it is: Countable values, usually whole numbers.

Characteristics:

  • Can only take specific, separate values
  • Usually obtained by counting
  • Cannot have fractions or decimals (in context)
  • There are gaps between possible values

Examples:

MeasurementWhy It’s Discrete
Number of students in classCan’t have 32.5 students
Number of cars in parkingCan’t have 2.7 cars
Goals scored in a matchCan’t score 3.5 goals
Number of siblingsCan’t have 1.5 siblings
Eggs in a basketCan’t have half an egg (in counting)

What you CAN do: Count, mean, median, mode, range, all mathematical operations

2. Continuous Data

What it is: Measurable values that can take any value within a range.

Characteristics:

  • Can take any value (including decimals)
  • Usually obtained by measuring
  • Infinite possible values between any two points
  • No gaps between possible values

Examples:

MeasurementWhy It’s Continuous
HeightCan be 165.7 cm, 165.73 cm, etc.
WeightCan be 58.5 kg, 58.52 kg, etc.
TemperatureCan be 28.3°C, 28.35°C, etc.
Time takenCan be 45.7 seconds
DistanceCan be 5.25 km

What you CAN do: All mathematical operations, precise measurements, detailed analysis


Quick Comparison: All Four Types

TypeCategoryOrderMath OperationsExample
NominalQualitativeNo orderCount, mode onlyBlood type (A, B, O)
OrdinalQualitativeHas orderCount, mode, medianRating (Poor to Excellent)
DiscreteQuantitativeHas orderAll operationsNumber of children
ContinuousQuantitativeHas orderAll operationsHeight in cm

Visual Summary

                    DATA
                      │
          ┌───────────┴───────────┐
          │                       │
     QUALITATIVE             QUANTITATIVE
     (Categories)              (Numbers)
          │                       │
     ┌────┴────┐             ┌────┴────┐
     │         │             │         │
  NOMINAL   ORDINAL      DISCRETE  CONTINUOUS
  (No order) (Ordered)   (Counted)  (Measured)

Identifying Data Types: Practice

Let’s practice identifying data types:

Data ExampleTypeReasoning
PIN code: 110001NominalNumbers used as labels, not for math
Rank in class: 1st, 2nd, 3rdOrdinalOrdered categories
Marks obtained: 78, 85, 92DiscreteCountable, whole numbers
Body temperature: 98.6°FContinuousMeasurable, can have decimals
Movie genre: Action, ComedyNominalCategories, no order
Pain level: 1-10 scaleOrdinalOrdered rating scale
Number of pages in bookDiscreteCountable
Time to complete taskContinuousMeasurable duration

⚠️ Common Confusion

Just because something is a number doesn’t make it quantitative! Phone numbers, PIN codes, and jersey numbers are actually nominal — they’re labels, not quantities. You wouldn’t calculate the “average phone number”!


Data Acquisition: Gathering Your Data

Now that we understand data types, let’s learn how to acquire (collect) data.

Primary Data Collection Methods

MethodDescriptionBest ForData Types
Surveys/QuestionnairesAsking people questionsOpinions, preferencesAll types
ObservationsWatching and recordingBehavior, eventsAll types
ExperimentsControlled testingCause-effect relationshipsQuantitative
InterviewsIn-depth conversationsDetailed insightsMostly qualitative
Sensors/DevicesAutomatic measurementPhysical measurementsContinuous

Secondary Data Sources

SourceExamplesConsiderations
Government portalsdata.gov.in, census dataReliable, may be outdated
Research databasesKaggle, UCI RepositoryClean, documented
Company recordsSales data, HR recordsNeed permission
Published reportsIndustry reports, studiesMay have bias
Web scrapingSocial media, websitesLegal and ethical concerns

Designing Good Survey Questions

Question TypeExampleData Type Generated
Multiple choice (single)“What is your gender?”Nominal
Multiple choice (ranked)“Rate your satisfaction (1-5)”Ordinal
Numeric input“How old are you?”Discrete
Scale/slider“Rate your pain level”Ordinal/Continuous
Open-ended“Describe your experience”Qualitative (text)

Tips for good questions:

  • Be clear and specific
  • Avoid leading questions
  • Provide appropriate options
  • Consider the data type you need

The Three Strategies for Data Acquisition

Beyond knowing where to collect data (primary or secondary), it is equally important to understand how data acquisition happens in practice. The CBSE AI curriculum identifies three distinct strategies:

1. Discovery

Discovery means finding and using data that already exists. You search for datasets that have already been collected — in government portals, research repositories, company databases, or published studies — and use them directly for your AI project.

Example: An AI project to predict crop yield could use rainfall and temperature data already published by the Indian Meteorological Department, rather than setting up new weather stations.

When to use it: When sufficient data already exists and collecting new data would be expensive or time-consuming.

2. Augmentation

Augmentation means enhancing an existing dataset to make it larger, more balanced, or more representative. You take data that already exists and systematically expand it.

Example: If you only have 200 photos of diseased leaves for training an AI plant disease detector, you can augment the dataset by rotating, flipping, and slightly adjusting the brightness of each image — creating 2,000 training examples from the original 200.

When to use it: When your existing data is too small, unbalanced, or lacks variety — and collecting entirely new data is not feasible.

3. Generation

Generation means creating entirely new, artificial (synthetic) data that did not previously exist. This is especially useful when real data is scarce, sensitive, or impossible to collect.

Example: Medical AI systems often lack enough examples of rare diseases. Researchers can generate synthetic patient records that mimic real data patterns but contain no actual patient information — solving the privacy problem while increasing training data.

When to use it: When real data cannot be obtained due to privacy concerns, cost, rarity of events, or ethical restrictions.

Strategy What You Do Example
Discovery Find and use existing data Government agricultural databases
Augmentation Expand and enhance existing data Image flipping and rotation
Generation Create synthetic data from scratch Simulated patient records


Data Processing: Cleaning and Preparing

Raw data is rarely ready for use. Data processing transforms raw data into usable format.

Common Data Problems

ProblemExampleSolution
Missing valuesAge: , 25, 30, , 28Fill with average, remove, or flag
Inconsistent formats“Male”, “M”, “male”, “MALE”Standardize to one format
OutliersHeights: 165, 170, 168, 950, 172Investigate: error or genuine?
DuplicatesSame person entered twiceRemove duplicates
Wrong data typesAge stored as “Twenty-five”Convert to number (25)
Invalid valuesAge: -5 or 500Validate against possible range

Data Processing Steps

Step 1: COLLECT raw data
               ↓
Step 2: INSPECT for problems
               ↓
Step 3: CLEAN (fix errors, fill gaps)
               ↓
Step 4: TRANSFORM (convert formats, create categories)
               ↓
Step 5: VALIDATE (check everything is correct)
               ↓
Step 6: STORE in organized format

Example: Processing Survey Data

Raw data collection

NameAgeCityRating
Rahul15delhiGood
PriyasixteenDelhiEXCELLENT
Amit14Delhigood
Rahul15delhiGood
NehaMumbaiAverage

Problems identified:

  1. Inconsistent capitalization (delhi/Delhi, Good/good/EXCELLENT)
  2. Age as text (“sixteen”)
  3. Missing value (Neha’s age)
  4. Duplicate entry (Rahul appears twice)

After processing:

NameAgeCityRating
Rahul15DelhiGood
Priya16DelhiExcellent
Amit14DelhiGood
Neha15*MumbaiAverage

*Filled with average age

Three Qualities of Well-Processed Data: Structure, Cleanliness, and Accuracy

Once data has been collected, processing it is not just about fixing obvious errors. Good data processing ensures three fundamental qualities that make data genuinely ready for AI training.

Structure

Structured data is organised in a consistent, machine-readable format. Every column has one type of value, every row represents one record, and the format is uniform throughout.

Example of unstructured raw response: “My age is fifteen and I live in Mumbai.”

After structuring: Age = 15, City = Mumbai (two separate, typed fields).

AI models cannot learn from disorganised text that mixes different kinds of information. Structure is the foundation.

Cleanliness

Clean data is free from errors, duplicates, missing values, and noise. A single dirty entry can skew the AI’s learning if not corrected.

Signs of unclean data: Missing values, inconsistent spelling (“Delhi” vs “delhi”), impossible values (age = 500), duplicate rows.

Cleanliness is not just about removing errors — it is about ensuring every data point truly represents reality.

Accuracy

Accurate data correctly reflects the real-world facts it is supposed to capture. Data can be structured and clean but still inaccurate — for example, if survey respondents answered dishonestly, or if sensors were incorrectly calibrated.

Example: A dataset of students’ study hours might be structured and clean, but if students overreported their hours, the data is inaccurate and will produce a misleading model.

Quality Question to Ask What Goes Wrong Without It
Structure Is the data organised consistently? AI cannot parse or learn from it
Cleanliness Is the data free of errors and noise? Errors bias the model
Accuracy Does the data reflect reality? Model learns wrong patterns

Independent and Dependent Features

When preparing data for an AI model, it is essential to understand the two types of features (columns/variables) in your dataset:

Independent features are the input variables — the pieces of information the AI uses to make a prediction. They are independent because they are the starting conditions, not determined by the output.

Dependent features are the output variable — the thing the AI is trying to predict or classify. It is dependent because its value depends on the pattern of the input features.

Example: Predicting Student Performance

Independent Features (Inputs) Dependent Feature (Output)
Study hours per day Final exam score
Attendance percentage
Assignment submission rate
Previous test scores

The AI is given the independent features and must learn to predict the dependent feature. Getting this separation right is critical: if you accidentally include the output as one of the inputs, the AI will appear to perform perfectly during training (because you gave it the answer) but will fail in the real world.

Another example — Disease Prediction:

  • Independent features: Age, blood pressure, cholesterol, weight, smoking habits
  • Dependent feature: Has heart disease (Yes/No)

💡 Key Insight

In machine learning, independent features are also called predictors or input variables, and the dependent feature is also called the label, target, or output variable. Understanding which is which is the first step in setting up any AI model correctly.



Data Interpretation: Making Sense of Data

Data interpretation is extracting meaning from processed data. Just processing data in a

Textual, Tabular, and Graphical Interpretation

Textual interpretation means expressing findings in written sentences and paragraphs. It is best for summary conclusions, narrative explanations, or when the insight is qualitative.

Example: “The survey shows that most students prefer samosas, with biryani as a close second. Students from higher grades tend to spend more per visit.”

Tabular interpretation means organising data into rows and columns for precise comparison. Tables are best when you need to show exact numbers across multiple categories and allow the reader to look up specific values.

Example:

Dish Votes Percentage
Samosa 80 40%
Biryani 60 30%
Sandwich 40 20%
Pasta 20 10%

Graphical interpretation means representing data visually — through bar charts, pie charts, line graphs, scatter plots, and other visuals. Graphs are best when you want to show trends, comparisons, or distributions at a glance, without requiring the reader to study numbers closely.

Form Best Used For Limitation
Textual Summaries, qualitative insights, narrative conclusions Hard to compare many values
Tabular Precise comparisons, looking up specific values Takes effort to spot trends
Graphical Trends, distributions, patterns at a glance Less precise than a table

A well-prepared data report typically combines all three: a graph to show the pattern, a table for precise numbers, and text to explain what it means.


Interpreting Different Data Types

Data TypeInterpretation Methods
NominalFrequency counts, mode, percentage distribution
OrdinalMedian, percentiles, ranking analysis
DiscreteMean, median, mode, frequency distribution
ContinuousMean, median, standard deviation, range

Example: Interpreting Survey Results

Question: “How satisfied are you with school facilities?”

Results (200 students):

RatingCountPercentage
Very Dissatisfied105%
Dissatisfied3015%
Neutral4020%
Satisfied8040%
Very Satisfied4020%

Interpretation:

  • Mode: Satisfied (most common response)
  • Median: Satisfied (middle value when ordered)
  • Positive responses: 60% (Satisfied + Very Satisfied)
  • Negative responses: 20% (Dissatisfied + Very Dissatisfied)
  • Insight: Most students are satisfied, but 20% are unhappy — worth investigating why.

Avoiding Interpretation Mistakes

MistakeExampleProblem
Treating ordinal as continuous“Average satisfaction is 3.7”Gaps between categories aren’t equal
Ignoring sample size“100% satisfied!” (based on 2 responses)Too small to be meaningful
Confusing correlation with causation“Ice cream sales and drowning both increase in summer, so ice cream causes drowning”Both caused by a third factor (heat)
Cherry-picking dataShowing only favorable resultsMisleading conclusions

Data Types in AI Applications

Different AI applications need different data types:

AI ApplicationPrimary Data TypesExample Data
Image ClassificationNominal (labels) + Continuous (pixels)“Cat” or “Dog” labels on images
Sentiment AnalysisOrdinal (sentiment scores)Positive/Negative/Neutral ratings
Price PredictionContinuousHouse prices, stock prices
Customer SegmentationMixedDemographics (nominal) + Spending (continuous)
Recommendation SystemsOrdinal (ratings) + Nominal (categories)Movie ratings, genre preferences
Medical DiagnosisMixedSymptoms (nominal), test results (continuous)

How AI Handles Different Data Types

Data TypeAI Treatment
NominalOne-hot encoding (converting to binary columns)
OrdinalLabel encoding (converting to ordered numbers)
DiscreteDirect use or normalization
ContinuousNormalization or standardization

Example: One-Hot Encoding for Colors

Original: Color = [Red, Blue, Green, Red, Blue]

Encoded:

Is_RedIs_BlueIs_Green
100
010
001
100
010

This allows AI to work with categorical data mathematically.


Activity: Classify and Plan

Part A: Data Type Classification

Classify each as Nominal, Ordinal, Discrete, or Continuous:

  1. Number of WhatsApp messages sent today
  2. Your blood group
  3. Temperature in your city
  4. Your position in a race (1st, 2nd, 3rd)
  5. Number of pets you have
  6. Your favorite sport
  7. Your height
  8. Customer review stars (1-5)
  9. Number of Instagram followers
  10. Your mood today (Happy, Sad, Neutral)

Part B: Data Collection Planning

You want to understand Class 9 students’ study habits. Design a survey with:

  • 2 nominal questions
  • 2 ordinal questions
  • 2 discrete questions
  • 1 continuous question

(Answers in Answer Key)


Quick Recap

  • Qualitative data describes categories (Nominal: no order; Ordinal: has order).
  • Quantitative data describes numbers (Discrete: counted; Continuous: measured).
  • Nominal data includes categories like colors, names, and types — no natural order.
  • Ordinal data includes rankings and ratings — order matters but gaps aren’t equal.
  • Discrete data includes countable values like number of students — whole numbers.
  • Continuous data includes measurements like height and temperature — any value possible.
  • Data acquisition involves collecting data through surveys, observations, experiments, or secondary sources.
  • Data processing cleans and prepares raw data by fixing errors, filling gaps, and standardizing formats.
  • Data interpretation extracts meaning using appropriate methods for each data type.
  • AI handles different data types differently — nominal needs encoding, continuous needs normalization.

Next Lesson: Data Visualization with Tableau: How to Create Interactive Charts and Dashboards

Previous Lesson: Data Literacy for Beginners: Data Pyramid, Data Privacy and Cyber Security


EXERCISES

A. Fill in the Blanks

  1. Data that describes categories or qualities is called __________________________ data.
  2. Data that describes quantities with numbers is called __________________________ data.
  3. Nominal data has categories with __________________________ natural order.
  4. Ordinal data has categories that can be ____________________________.
  5. Discrete data is obtained by __________________________ (counting/measuring).
  6. Continuous data is obtained by __________________________ (counting/measuring).
  7. Phone numbers and PIN codes are examples of __________________________ data, not quantitative.
  8. The process of fixing errors and standardizing formats is called data __________________________.
  9. Converting categorical data into numerical format for AI is called __________________________.
  10. ____________________________.gov.in is India’s official open data portal.

B. Multiple Choice Questions

1. Which is an example of qualitative data?

(a) Height: 165 cm
(b) Age: 15 years
(c) Favorite color: Blue
(d) Temperature: 28°C

2. Ordinal data differs from nominal data because:

(a) It uses numbers
(b) It has a natural order
(c) It can be measured
(d) It has no categories

3. “Number of students in a class” is what type of data?

(a) Nominal
(b) Ordinal
(c) Discrete
(d) Continuous

4. Body temperature (98.6°F) is an example of:

(a) Nominal data
(b) Ordinal data
(c) Discrete data
(d) Continuous data

5. Which operation is NOT valid for nominal data?

(a) Counting occurrences
(b) Finding the mode
(c) Calculating the mean
(d) Finding percentages

6. A customer satisfaction rating of “Excellent, Good, Average, Poor” is:

(a) Nominal
(b) Ordinal
(c) Discrete
(d) Continuous

7. Which is a primary data collection method?

(a) Using government databases
(b) Conducting surveys
(c) Downloading from Kaggle
(d) Reading research reports

8. One-hot encoding is used for:

(a) Continuous data
(b) Discrete data
(c) Nominal data
(d) Ordinal data

9. “Age: Twenty-five” instead of “25” is an example of:

(a) Missing value
(b) Wrong data type
(c) Duplicate entry
(d) Outlier

10. Jersey numbers on sports uniforms are:

(a) Nominal data
(b) Ordinal data
(c) Discrete data
(d) Continuous data


C. True or False

  1. Qualitative data can always be measured with numbers. (__)
  2. Ordinal data has categories that can be ranked. (__)
  3. You can calculate the average of nominal data. (__)
  4. Discrete data can have decimal values. (__)
  5. Height and weight are examples of continuous data. (__)
  6. PIN codes are quantitative data because they contain numbers. (__)
  7. Data processing includes cleaning and standardizing data. (__)
  8. The mode can be found for all types of data. (__)
  9. Surveys can collect both qualitative and quantitative data. (__)
  10. Correlation always means causation. (__)

D. Define the Following (30-40 words each)

  1. Qualitative Data
  2. Quantitative Data
  3. Nominal Data
  4. Ordinal Data
  5. Discrete Data
  6. Continuous Data
  7. Data Processing

E. Very Short Answer Questions (40-50 words each)

  1. What is the main difference between qualitative and quantitative data?
  2. Explain the difference between nominal and ordinal data with examples.
  3. How is discrete data different from continuous data?
  4. Why are phone numbers considered nominal data even though they contain digits?
  5. What are three common problems found in raw data?
  6. Name three primary methods of data collection.
  7. What is one-hot encoding and why is it used?
  8. Why can’t you calculate the mean of ordinal data?
  9. Give two examples each of discrete and continuous data.
  10. What should you check when interpreting data to avoid mistakes?

F. Long Answer Questions (75-100 words each)

  1. Explain the four types of data (nominal, ordinal, discrete, continuous) with two examples each.
  2. You’re collecting data about students’ mobile phone usage. What type of data would each of the following generate: (a) Brand of phone, (b) Hours of daily usage, (c) Number of apps installed, (d) Satisfaction rating?
  3. Describe the steps involved in data processing. Why is each step important?
  4. What are the different methods of data acquisition? Compare primary and secondary data sources.
  5. Explain how AI handles different types of data. Why does nominal data need special treatment?
  6. A survey about canteen food collected these responses for “food quality”: Excellent, Good, Good, Average, Excellent, Poor, Good. Analyze this data appropriately.
  7. Design a data collection plan to understand exercise habits of Class 9 students. Include questions that generate all four data types.

ANSWER KEY

A. Fill in the Blanks – Answers

  1. qualitative — Qualitative data describes categories.
  2. quantitative — Quantitative data describes numbers.
  3. no — Nominal categories have no natural order.
  4. ranked — Ordinal categories can be ordered.
  5. counting — Discrete data is counted.
  6. measuring — Continuous data is measured.
  7. nominal — Phone numbers are labels, not quantities.
  8. processing/cleaning — Processing fixes data issues.
  9. encoding — Encoding converts categories to numbers.
  10. data — data.gov.in is India’s open data portal.

B. Multiple Choice Questions – Answers

  1. (c) Favorite color: Blue — Colors are categories, not numbers.
  2. (b) It has a natural order — Ordinal data can be ranked.
  3. (c) Discrete — Students are counted as whole numbers.
  4. (d) Continuous data — Temperature can have any decimal value.
  5. (c) Calculating the mean — Can’t average categories.
  6. (b) Ordinal — Ratings have a natural order.
  7. (b) Conducting surveys — Primary = collecting yourself.
  8. (c) Nominal data — Converts categories to binary columns.
  9. (b) Wrong data type — Text instead of number.
  10. (a) Nominal data — Jersey numbers are labels, not quantities.

C. True or False – Answers

  1. False — Qualitative data describes qualities, not measured numbers.
  2. True — Ordinal categories have a natural ranking.
  3. False — Cannot calculate average of categories like colors.
  4. False — Discrete data is whole numbers only.
  5. True — Both can have any decimal value.
  6. False — PIN codes are labels (nominal), not quantities.
  7. True — Processing includes cleaning and standardizing.
  8. True — Mode (most common) works for all types.
  9. True — Surveys can include various question types.
  10. False — Correlation does not imply causation.

D. Definitions – Answers

1. Qualitative Data: Data that describes qualities or characteristics using categories rather than numbers. It answers “what kind?” and includes types like colors, names, and opinions.

2. Quantitative Data: Data that describes quantities using numbers and measurements. It answers “how many?” or “how much?” and includes values like height, age, and count.

3. Nominal Data: A type of qualitative data where categories have no natural order or ranking. Examples include blood type, gender, and favorite color.

4. Ordinal Data: A type of qualitative data where categories have a natural order but the gaps between them aren’t necessarily equal. Examples include satisfaction ratings and education levels.

5. Discrete Data: A type of quantitative data with countable, separate values, usually whole numbers. Cannot have fractions in context. Examples: number of children, goals scored.

6. Continuous Data: A type of quantitative data that can take any value within a range, including decimals. Obtained by measuring. Examples: height, weight, temperature.

7. Data Processing: The steps of transforming raw data into usable format, including cleaning errors, handling missing values, standardizing formats, removing duplicates, and validating accuracy.


E. Very Short Answer Questions – Answers

1. Qualitative vs quantitative difference:
Qualitative data describes categories or qualities (colors, names, opinions) answering “what kind?” Quantitative data describes quantities with numbers (height, age, count) answering “how many?” or “how much?”

2. Nominal vs ordinal with examples:
Nominal: categories with no order (blood types A, B, O — no type is “higher”). Ordinal: categories with order (education levels: Primary < Secondary < Graduate — there’s a ranking but gaps aren’t equal).

3. Discrete vs continuous difference:
Discrete data is countable with separate whole values (students in class: 32, not 32.5). Continuous data is measurable with any value possible (height: 165.7 cm, can be any decimal).

4. Phone numbers as nominal:
Phone numbers use digits but are labels/identifiers, not quantities. You wouldn’t add phone numbers or calculate their average. The digits don’t represent amounts — they’re just identification codes.

5. Three raw data problems:
Missing values (empty cells), inconsistent formats (Male/M/male), and outliers (height: 950 cm — likely error). Others include duplicates and wrong data types.

6. Three primary collection methods:
Surveys/questionnaires (asking questions), observations (watching and recording), and experiments (controlled testing). Also interviews and sensor measurements.

7. One-hot encoding:
Converting nominal categories into binary columns. Example: Color (Red, Blue) becomes Is_Red (1/0) and Is_Blue (1/0). Used because AI algorithms need numerical inputs to perform calculations.

8. Why no mean for ordinal:
The gaps between ordinal categories aren’t equal. The difference between “Good” and “Excellent” may not equal the difference between “Poor” and “Average.” Mean assumes equal spacing.

9. Discrete and continuous examples:
Discrete: Number of siblings (0, 1, 2…), goals in a match (0, 1, 2…). Continuous: Height (165.5 cm), weight (58.3 kg), temperature (28.7°C).

10. Avoiding interpretation mistakes:
Check sample size (is it large enough?), don’t confuse correlation with causation, use appropriate methods for each data type, don’t cherry-pick favorable results, consider context.


F. Long Answer Questions – Answers

1. Four data types with examples:
Nominal: Categories without order — Blood type (A, B, O, AB), Favorite color (Red, Blue, Green). Ordinal: Ranked categories — Education (Primary < Secondary < Graduate), Star rating (1-5 stars). Discrete: Countable whole numbers — Number of siblings (0, 1, 2), Books read this year (5, 10, 15). Continuous: Measurable any-value — Height (165.7 cm), Temperature (28.5°C). Each type requires different analysis methods.

2. Mobile phone usage data types:
(a) Brand of phone: Nominal — categories like Apple, Samsung, OnePlus with no natural order. (b) Hours of daily usage: Continuous — can be 2.5 hours, 3.7 hours, any decimal value. (c) Number of apps installed: Discrete — whole numbers only (25, 30, 45 apps). (d) Satisfaction rating: Ordinal — ranked categories (Very Satisfied > Satisfied > Neutral, etc.).

3. Data processing steps:
Collect: Gather raw data from sources. Inspect: Check for problems (missing values, errors, duplicates). Clean: Fix errors, fill gaps, remove duplicates. Transform: Standardize formats, convert types, create categories. Validate: Verify everything is correct and consistent. Store: Organize in proper format. Each step is important because errors at any stage corrupt final analysis.

4. Data acquisition methods:
Primary sources: Collecting yourself through surveys, observations, experiments, interviews — tailored to your needs but time-consuming. Secondary sources: Using existing data from government portals, research databases, company records — saves time but may not perfectly fit needs. Primary gives control over quality; secondary provides larger datasets quickly.

5. AI and data types:
AI algorithms work with numbers, so categorical data needs conversion. Nominal data uses one-hot encoding (color → Is_Red, Is_Blue columns with 0/1). Ordinal data uses label encoding (Poor=1, Average=2, Good=3). Continuous data often needs normalization (scaling to 0-1 range). Discrete data may be used directly or normalized. Without proper handling, AI can’t process categories mathematically.

6. Canteen survey analysis:
Data: Excellent, Good, Good, Average, Excellent, Poor, Good (n=7, ordinal data). Frequency: Excellent-2, Good-3, Average-1, Poor-1. Mode: Good (most common). Median: Good (middle value when ordered). Percentage: Positive (Excellent+Good): 71%, Negative (Poor): 14%. Interpretation: Most students find food quality acceptable, but one unhappy customer worth investigating. Cannot calculate mean — ordinal gaps aren’t equal.

7. Exercise habits data collection:
Nominal questions: “What type of exercise do you prefer?” (Running, Swimming, Gym, Yoga), “Where do you exercise?” (Home, Park, Gym, School). Ordinal questions: “How would you rate your fitness level?” (Poor to Excellent), “How motivated are you to exercise?” (1-5 scale). Discrete questions: “How many days per week do you exercise?” (0-7), “How many push-ups can you do?” (number). Continuous question: “How many minutes do you exercise per session?” (can be 25.5 minutes).


Activity Answers

Part A: Data Type Classification

  1. Number of WhatsApp messages — Discrete (countable whole numbers)
  2. Blood group — Nominal (categories, no order)
  3. Temperature — Continuous (measurable, decimals possible)
  4. Position in race — Ordinal (ranked categories)
  5. Number of pets — Discrete (countable whole numbers)
  6. Favorite sport — Nominal (categories, no order)
  7. Height — Continuous (measurable, decimals possible)
  8. Customer review stars — Ordinal (ranked scale)
  9. Instagram followers — Discrete (countable whole numbers)
  10. Mood today — Nominal (categories, no inherent order) or Ordinal (if treated as scale)

Part B: Survey Design (Sample)

Nominal questions:

  • “What is your preferred study location?” (Home, Library, Classroom, Café)
  • “Which subject do you find most interesting?” (Math, Science, English, Social Studies)

Ordinal questions:

  • “How would you rate your study habits?” (Excellent, Good, Average, Poor)
  • “How stressed do you feel about exams?” (Not at all, Slightly, Moderately, Very, Extremely)

Discrete questions:

  • “How many hours do you study on weekdays?” (1, 2, 3, 4, 5+)
  • “How many subjects do you need extra help with?” (0, 1, 2, 3, 4+)

Continuous question:

  • “On average, how many minutes do you spend on homework daily?” (Open numeric response)

Next Lesson: Data Visualization with Tableau: How to Create Interactive Charts and Dashboards

Previous Lesson: Data Literacy for Beginners: Data Pyramid, Data Privacy and Cyber Security

Pin It on Pinterest

Share This