Chapter 10: Computer Vision (CV)
"Teaching machines to see and interpret images/videos like humans."
🔹 1. What is Computer Vision?
Computer Vision (CV) is a subfield of AI that allows machines to analyze, process, and understand visual data (images, videos).
📷 Applications:
-
Face recognition (e.g., iPhone Face ID)
-
Object detection (e.g., self-driving cars)
-
Medical imaging (e.g., tumor detection)
-
OCR (Optical Character Recognition)
🔹 2. Basic Image Concepts
Term | Meaning |
---|---|
Pixel | Smallest unit of an image |
Grayscale Image | Single channel (Black & White) |
RGB Image | 3 channels (Red, Green, Blue) |
Resolution | Width × Height of an image |
Image Matrix | Each pixel represented as a number (0-255) |
📌 Images are represented as numerical arrays → can be processed with ML/DL.
🔹 3. Image Processing with OpenCV
OpenCV is a popular open-source library for computer vision tasks.
Common Operations:
🔹 4. Object Detection vs. Image Classification
Task | Goal | Example |
---|---|---|
Classification | Identify object | "This is a cat" |
Detection | Locate object + identify | "Cat is at (x1,y1,x2,y2)" |
Segmentation | Pixel-level classification | Separate each pixel of cat vs background |
🔹 5. Deep Learning for CV
Most modern CV tasks are done using Convolutional Neural Networks (CNNs).
🔸 CNN (Convolutional Neural Network)
-
Detect patterns (edges, corners, shapes)
-
Layers:
-
Convolution Layer → Feature extractor
-
Pooling Layer → Downsampling
-
Fully Connected Layer → Classification
-
Example Architecture:
Libraries:
-
TensorFlow/Keras
-
PyTorch
🔹 6. Image Classification Example with Keras
🔹 7. Advanced CV Techniques
🔸 Object Detection
Model | Use Case |
---|---|
YOLO (You Only Look Once) | Real-time detection |
SSD (Single Shot Detector) | Faster, less accurate |
Faster R-CNN | High accuracy, slower |
🔸 Image Segmentation
Technique | Tool |
---|---|
Semantic Segmentation | Classify each pixel |
Instance Segmentation | Separate object instances |
Mask R-CNN | Does both! |
🔸 Face Detection & Recognition
-
Haar Cascades (OpenCV)
-
FaceNet, Dlib, DeepFace (deep learning)
🔹 8. Projects in Computer Vision
Project | Tools Used |
---|---|
Face Mask Detector | CNN + OpenCV |
Number Plate Recognition | OCR + Tesseract |
Emotion Recognition | CNN + Facial landmarks |
Real-Time Object Detection | YOLO + Webcam |
AI Virtual Mouse | Hand tracking with MediaPipe |
🔹 9. Tools for Practice
Tool | Purpose |
---|---|
OpenCV | Image/video processing |
MediaPipe | Real-time hand/face tracking |
LabelImg | Annotating datasets |
Kaggle | Practice datasets |
Google Colab | Free GPU for training |
✅ Chapter Summary
Topic | Key Learning |
---|---|
CV Basics | Pixels, channels, image formats |
OpenCV | Preprocessing, transformations |
CNN | Most powerful for image tasks |
Detection Models | YOLO, SSD, Mask R-CNN |
Real-world Use | Face recognition, object detection |
💡 Mini Tasks:
-
Build an MNIST digit classifier using CNN.
-
Create a real-time face detector using OpenCV.
-
Train a YOLOv5 model on custom images (with LabelImg).
-
Try image segmentation using
Mask R-CNN
orDeepLab
.