What is a Dataset
In Machine Learning, a dataset is just a collection of data that we use to train and test our models.
Think of it like an Excel sheet:
- Rows → each row is one data point (called a sample).
- Columns → each column contains information about that sample.
Features (Inputs)
- Features are the independent variables — the input data we provide to the ML model.
- They describe the properties of each sample.
- Example:
- Age
- Salary
- Height
- Exam scores Features are like the ingredients we give to a recipe.
Labels (Outputs / Targets)
- The label is what we want the model to predict.
- It is also called the dependent variable or target.
- Example:
- Will the student pass or fail? (
Yes/No)
- Price of a house
- Disease diagnosis (
Positive/Negative)
Labels are like the final dish that we expect after cooking with the ingredients.
Example Dataset
| Age |
Salary |
Hours Studied |
Passed (Label) |
| 20 |
30,000 |
5 |
Yes |
| 22 |
25,000 |
2 |
No |
| 19 |
40,000 |
6 |
Yes |
| 21 |
35,000 |
1 |
No |