Chapter 11: Reinforcement Learning (RL)
“Learning by trial and error—just like humans do!”
🔹 1. What is Reinforcement Learning?
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment and receiving rewards or punishments.
🧠 Example: A robot learns to walk by trying, failing, adjusting, and trying again—receiving reward when it moves forward.
🔹 2. Key Components of RL
| Component | Meaning |
|---|---|
| Agent | Learner or decision maker (e.g., robot, AI player) |
| Environment | The world the agent interacts with (e.g., game, real-world task) |
| State (S) | Current situation of the agent |
| Action (A) | Choice made by the agent |
| Reward (R) | Feedback from the environment |
| Policy (π) | Strategy that agent follows |
| Value Function (V) | How good a state is in terms of future reward |
| Q-Value (Q) | Expected reward of taking action A in state S |
🔹 3. RL vs Supervised Learning
| Feature | Reinforcement Learning | Supervised Learning |
|---|---|---|
| Data | Agent learns through interaction | Labeled dataset |
| Feedback | Reward/Punishment | Correct answer (label) |
| Goal | Maximize reward over time | Minimize prediction error |
🔹 4. Types of RL
🔸 1. Model-Free vs Model-Based
-
Model-Free: No knowledge of environment dynamics
(e.g., Q-Learning, DQN) -
Model-Based: Tries to learn a model of the environment
🔸 2. Exploration vs Exploitation
-
Exploration: Try new actions to discover better outcomes
-
Exploitation: Use known actions to maximize reward
🔁 Balance is key!
🔹 5. Important Algorithms in RL
| Algorithm | Description |
|---|---|
| Q-Learning | Learn the Q-values (value of taking action in a state) |
| SARSA | Similar to Q-learning but updates during action |
| DQN (Deep Q Network) | Use neural networks for Q-Learning |
| Policy Gradient | Directly learn the policy |
| Actor-Critic | Combines value-based & policy-based methods |
🔹 6. Q-Learning Explained
Goal: Learn Q(s, a): What is the best action to take in a state?
Q-Learning Update Formula:
| Symbol | Meaning |
|---|---|
| α (alpha) | Learning rate |
| γ (gamma) | Discount factor |
| r | Reward |
| s' | Next state |
🔹 7. Deep Q-Network (DQN)
Combines Q-Learning with Deep Neural Networks.
-
Inputs: State (image, game info, etc.)
-
Output: Q-values for each action
🏁 Used in:
-
Atari Game Solvers
-
CartPole Balancing
-
Self-driving Simulation
🔹 8. Popular RL Environments
Use these to train/test RL algorithms:
| Platform | Games/Environments |
|---|---|
| OpenAI Gym | CartPole, MountainCar, LunarLander |
| Atari | Breakout, Pong, etc. |
| Unity ML-Agents | 3D games |
| PyBullet / MuJoCo | Physics-based environments |
🔹 9. Applications of RL
| Area | Use Case |
|---|---|
| Robotics | Teaching robots to walk, pick objects |
| Games | AlphaGo, Dota 2 AI |
| Finance | Trading agents |
| Healthcare | Treatment strategy optimization |
| Self-Driving | Lane control, braking, steering |
🔹 10. Challenges in RL
-
Delayed rewards
-
Exploration vs Exploitation
-
High computation cost
-
Training instability
✅ Chapter Summary
| Key Concept | Meaning |
|---|---|
| Agent | Learns and acts |
| Environment | World agent interacts with |
| Reward | Signal of success |
| Policy | Agent’s strategy |
| Q-learning | Value-based learning |
| DQN | Neural network-based Q-learning |
💡 Mini Projects You Can Try:
-
Balance the CartPole using Q-learning (OpenAI Gym)
-
Train an AI to play Pong with Deep Q-Network
-
Simulate a stock trader using RL
-
Create a smart taxi agent using SARSA