Chapter 5: Unsupervised Learning
🧠 In Unsupervised Learning, the model discovers patterns or structures from data without any labels.
No answers are given — the algorithm finds hidden relationships on its own.
🔍 Key Idea:
“Group similar things together or reduce complexity without knowing the correct output.”
✳️ Types of Unsupervised Learning
Type | Description | Use Case |
---|---|---|
Clustering | Group similar data points | Customer Segmentation |
Dimensionality Reduction | Reduce features while preserving information | Data Visualization |
Association Rule Learning | Find rules between variables | Market Basket Analysis |
🔹 1. Clustering
Grouping data points such that those in the same group are more similar to each other than to those in other groups.
a. K-Means Clustering
-
Partitions data into K clusters
-
Iteratively updates cluster centers (centroids)
📌 Example: Grouping customers by spending habits
b. Hierarchical Clustering
-
Builds a tree (dendrogram) of clusters
-
Doesn’t require pre-specifying number of clusters
📌 Example: Document or gene clustering
🔹 2. Dimensionality Reduction
Reducing the number of features (columns) in the dataset while retaining important info
a. PCA (Principal Component Analysis)
-
Converts high-dimensional data into fewer dimensions (components)
📌 Used for: Visualization, noise removal, speeding up algorithms
b. t-SNE
-
Better for visualizing high-dimensional data
-
Preserves local structure (similar points stay close)
📌 Often used to visualize clusters in NLP or image features
🔹 3. Association Rule Learning
Finds relationships between variables in large datasets.
a. Apriori Algorithm
-
Discovers frequent itemsets and rules
📌 Example: If a customer buys bread & butter, they’re likely to buy milk.
b. Eclat Algorithm
-
More memory-efficient than Apriori
-
Uses set intersections
📌 Used in: Market Basket Analysis, Recommender Systems
📏 Evaluation Metrics in Unsupervised Learning
Since there's no "correct label", metrics are indirect.
Metric | Use |
---|---|
Silhouette Score | How well-clustered data points are |
Inertia (for KMeans) | Measures compactness of clusters |
Explained Variance (PCA) | How much info each component keeps |
💻 Hands-On Projects
✅ Customer Segmentation (K-Means)
-
Data: Age, Spending Score, Income
-
Output: Group customers into clusters
✅ Market Basket Analysis (Apriori)
-
Input: Transaction data
-
Output: Discover item combinations often bought together
🧠 Summary of Chapter 5
Concept | Summary |
---|---|
Unsupervised Learning | No labels; model finds patterns |
Clustering | K-Means, Hierarchical |
Dimensionality Reduction | PCA, t-SNE |
Association Rules | Apriori, Eclat |
Hands-on | Customer groups, Market rules |
✅ Mini Assignment:
-
Use K-Means on a dataset with 2 features and visualize the clusters.
-
Apply PCA to reduce a dataset to 2 dimensions.
-
Try using the Apriori algorithm on transaction data.