← All programs
Foundations

Reinforcement Learning

Agents that learn from interaction

4.7· 194 student reviews
Level: AdvancedDuration: 6 monthsCredits: 21Tuition: $699 CADLead instructor: Dr. Yusuf Adetola

About this program

Six months on RL, from multi-armed bandits to deep RL and modern offline methods. Covers Q-learning, policy gradients, actor-critic, PPO, model-based RL, and RLHF for language models. Heavy programming component using Gymnasium and clean-RL.

Student ratings

Excellent — 194 verified Canadian graduates rated this program 4.7/5. Reviews emphasize the applied capstone, instructor responsiveness, and career outcomes.

4.7
194 reviews
  • 5
    144
  • 4
    44
  • 3
    4
  • 2
    1
  • 1
    1

Who this program is for

  • Practitioners already shipping foundations work who want depth
  • Senior engineers, data scientists, and technical leads
  • Canadian residents seeking a verifiable diploma credential

Topics you'll cover

6 modules across 6 months — 24 lessons in total.

01Month 1 — RL Foundations02Month 2 — Deep Q-Learning03Month 3 — Policy Gradients04Month 4 — Modern Algorithms05Month 5 — Offline RL and RLHF06Month 6 — Capstone

Six-month syllabus

Module 1 · Month 1 — RL Foundations
  • L1MDPs and value functions
  • L2Bandits and exploration
  • L3Tabular Q-learning
  • L4Lab: gridworlds
Module 2 · Month 2 — Deep Q-Learning
  • L1DQN and variants
  • L2Experience replay
  • L3Double and Dueling DQN
  • L4Lab: Atari
Module 3 · Month 3 — Policy Gradients
  • L1REINFORCE
  • L2Actor-critic
  • L3Advantage estimation
  • L4Lab: continuous control
Module 4 · Month 4 — Modern Algorithms
  • L1PPO in detail
  • L2SAC and TD3
  • L3Distributed RL
  • L4Lab: PPO on Mujoco
Module 5 · Month 5 — Offline RL and RLHF
  • L1Conservative Q-learning
  • L2Decision transformers
  • L3RLHF for LLMs
  • L4DPO and alternatives
Module 6 · Month 6 — Capstone
  • L1Pick a domain
  • L2Design environment
  • L3Train and evaluate
  • L4Workshop write-up

What you'll be able to do

  • Implement core RL algorithms from scratch
  • Train PPO agents on custom environments
  • Apply offline RL to logged data
  • Use RLHF to align language models
  • Read and reproduce modern RL papers

Career paths after graduation

Role 1
AI Literacy Lead
Role 2
Junior AI Analyst
Role 3
AI-Augmented Knowledge Worker

Frequently asked questions

How much does the Reinforcement Learning cost?

Tuition is $699 CAD. You can pay in full at checkout or choose an interest-free monthly plan. A 30-day refund window applies from your start date.

How long is the Reinforcement Learning program?

6 months, cohort-based and fully online. Expect roughly 13 hours per week including live Thursday sessions at 7pm ET.

What are the prerequisites?

Strong Python; Deep learning fundamentals

Is the diploma recognized in Canada?

Yes. Graduates receive the Altaris AI Academy Diploma in Foundations — a verifiable credential with a unique certificate number you can publish on LinkedIn and that any employer can verify at smart-ai-future.lovable.app/verify.

What is the refund policy?

Full refund within 30 days of your cohort start date, no questions asked. After day 30, prorated refunds are available per our Refund Policy.

Who teaches the program?

Working Canadian AI practitioners — not academics. Each cohort has a lead instructor plus a 1:1 mentor pairing for the duration of the program.

Students also enrolled in

More Foundations programs from Altaris.