Reinforcement Learning
Agents that learn from interaction
About this program
Six months on RL, from multi-armed bandits to deep RL and modern offline methods. Covers Q-learning, policy gradients, actor-critic, PPO, model-based RL, and RLHF for language models. Heavy programming component using Gymnasium and clean-RL.
Student ratings
Excellent — 194 verified Canadian graduates rated this program 4.7/5. Reviews emphasize the applied capstone, instructor responsiveness, and career outcomes.
- 5★144
- 4★44
- 3★4
- 2★1
- 1★1
Who this program is for
- →Practitioners already shipping foundations work who want depth
- →Senior engineers, data scientists, and technical leads
- →Canadian residents seeking a verifiable diploma credential
Topics you'll cover
6 modules across 6 months — 24 lessons in total.
Six-month syllabus
Module 1 · Month 1 — RL Foundations▾
- L1MDPs and value functions
- L2Bandits and exploration
- L3Tabular Q-learning
- L4Lab: gridworlds
Module 2 · Month 2 — Deep Q-Learning▾
- L1DQN and variants
- L2Experience replay
- L3Double and Dueling DQN
- L4Lab: Atari
Module 3 · Month 3 — Policy Gradients▾
- L1REINFORCE
- L2Actor-critic
- L3Advantage estimation
- L4Lab: continuous control
Module 4 · Month 4 — Modern Algorithms▾
- L1PPO in detail
- L2SAC and TD3
- L3Distributed RL
- L4Lab: PPO on Mujoco
Module 5 · Month 5 — Offline RL and RLHF▾
- L1Conservative Q-learning
- L2Decision transformers
- L3RLHF for LLMs
- L4DPO and alternatives
Module 6 · Month 6 — Capstone▾
- L1Pick a domain
- L2Design environment
- L3Train and evaluate
- L4Workshop write-up
What you'll be able to do
- ●Implement core RL algorithms from scratch
- ●Train PPO agents on custom environments
- ●Apply offline RL to logged data
- ●Use RLHF to align language models
- ●Read and reproduce modern RL papers
Career paths after graduation
Frequently asked questions
How much does the Reinforcement Learning cost?▾
Tuition is $699 CAD. You can pay in full at checkout or choose an interest-free monthly plan. A 30-day refund window applies from your start date.
How long is the Reinforcement Learning program?▾
6 months, cohort-based and fully online. Expect roughly 13 hours per week including live Thursday sessions at 7pm ET.
What are the prerequisites?▾
Strong Python; Deep learning fundamentals
Is the diploma recognized in Canada?▾
Yes. Graduates receive the Altaris AI Academy Diploma in Foundations — a verifiable credential with a unique certificate number you can publish on LinkedIn and that any employer can verify at smart-ai-future.lovable.app/verify.
What is the refund policy?▾
Full refund within 30 days of your cohort start date, no questions asked. After day 30, prorated refunds are available per our Refund Policy.
Who teaches the program?▾
Working Canadian AI practitioners — not academics. Each cohort has a lead instructor plus a 1:1 mentor pairing for the duration of the program.
Students also enrolled in
More Foundations programs from Altaris.