Denna sida på svenska This page in English

Study Circle in Reinforcement Learning

NEWS: The project presentation on Friday May 3 has been postponed to Friday May 24 10:15 - 12:00. One reason is lack of finished projects to present and another reason is that Karl-Erik is away that day.


This a graduate/PhD course on Reinforcement Learning (RL) given on study circle form, i.e, it is the participants that do most of the work.

We will mainly follow the Reinforcement Learning Course given by David Silver at UCL.

We will have course meetings once per week. Before the meeting the course participants should have gone through the lecture slides and watched the corresponding lecture video.

The UCL course follows quite closely the standard text book on RL:

Most of the algorithms are available in Python in the following repos:

A new version of the course that combines Advanced NN and Tensorflow with RL can be found here

Neural-MMO - Multi-agent Reinforcement Learning Environment from OpenAI

DeepMind's python tools for connecting to, training RL, and playing Starcraft 2

The background to the name Dynamic Programming is explained in Richard Bellman's acceptance speech for the Norbert Wiener Prize

Course responsible:Karl-Erik Årzén

Meetings (the default meeting room is the Seminar Room at Dept of Automatic Control, 2nd floor):

  • Meeting1: January 25, 13:00 - 15:00. Introduction. Before the meeting each participants should have gone through Lecture 1 in the UCL course. Notes from Meeting 1.
    • Before Meeting 2: Watch Lecture 2 and work through the OpenAI Gym Tutorial from dennybritz
  • Meeting 2: Friday February 1, 13:15 - 15:00 Markov Decision Processes
  • Meeting 3: Monday February 11, 13:15 - 15:00 Planning by Dynamic Programming
    • Before Meeting 4: Watch Lecture 4 and do the following exercises from Dannybritz
  • Meeting 4: Monday February 18, 13:15 - 15:00 Model-Free Prediction
    • Before Meeting 5: Watch Lecture 5 and do the following exercises from Dannybritz
      • Implement the on-policy first-visit Monte Carlo Control algorithm
      • Implement the off-policy every-visit Monte Carlo Control using Weighted Important Sampling algorithm
  • Meeting 5: Friday February 22, 10:15 - 12:00 Model-Free Control (OBS: In Lab F, First floor, M-building)
  • Meeting 6: Friday March 1, 10:15 - 12:00 Value Function Approximation
    • Before Meeting 7: Watch Lecture 7 and study the following exercises
  • Meeting 7: Friday March 8, 10:15 - 12:00 Policy Gradient Methods
    • Before Meeting 8: Watch Lecture 8 and study the following exercises (Note that these are exercises that are based on Lecture 6, there are no new exercises for Lecture 8)
      • Deep-Q Learning for Atari Games
      • Double-Q Learning
  • Meeting 8: Friday March 15, 10:15 - 12:00 Integrating Learning and Planning
    • Before Meeting 9: Watch Lecture 9
  • Meeting 9: Friday March 22, 10:15 - 12:00 Exploration and Exploitation
    • Meeting 10: Watch Lecture 10
    • Select a project for the extended version of the course, see below.
  • Meeting 10: Friday March 29, 10:15 - 12:00 Case Study: RL in Classic Games

Paper about Mastering the game of go

Paper about Deepstack playing poker

 Possible Projects for the Extended Version of the Course

Two types of projects:

  • Programming projects, e.g.,
    • Participate in one of the DeepMind or OpenAI's web competitions
    • Implement RL on your own research topic
  • Advanced Topic projects
    • Study some advanced and/or not so well treated topic in RL and present it at a lecture, e.g.,
      • Connections between RL and control (B. Recht: "A Tour of Reinforcement Learning: The view from Continuous Control")
      • RL with continuous action and state spaces
      • Dual Control and how it connects to RL
      • RL for Robotics
      • Connections between adaptive control and RL
      • The Stanford Helicopter RL Case Study
      • Advanced Research Topics from the new version of the UCL course (RL8 Lecture + video gives a set of topics)
      • Your own project