CS 500 Light Seminar: Learning Theory for Sequential Decision Making
Rutgers University
Fall 2006
Organized by
Alexander L. Strehl and
Michael L. Littman
Time: Wednesday, 1:30 - 3:00 pm
Place: Rutgers, CoRE B (also known as CoRE 305)
Semester: Fall 2006
Description: The purpose of this seminar is to meet weekly and discuss research papers in learning theory, especially reinforcement learning. Two types of papers will be emphasized. The first type will consist of recent papers that present strong and well-grounded results that are potentially useful in solving real world tasks. The second type includes important papers from disciplines with strong connections and relevance to reinforcement learning. Examples of the latter include papers on the following: active learning, control theory, online learning, and statistical learning.
Anyone who is willing to read the papers and participate in the discussion is welcome to attend. However, it will be assumed that participants have been exposed to either reinforcement learning or supervised learning at a non-trivial level. This is not an introduction to reinforcement learning, but we do invite those who have a strong background in learning theory but not necessarily reinforcement learning to attend. Please contact me at "strehl@cs.rutgers.edu" with any questions.
Announcements
- The first day of the seminar is Wednesday, September 13.
Schedule
- 9/13/06: PAC adaptive control of linear systems by Claude-Nicolas Fiechter: paper. (Alex will lead discussion)
- 9/20/06: The nonstochastic multiarmed bandit problem by Peter Auer, Nicolň Cesa-Bianchi, Yoav Freund, and Robert E. Schapire: ps. (Rohan will lead discussion)
- 9/27/06: Continue discussing The nonstochastic multiarmed bandit problem by Peter Auer, Nicolň Cesa-Bianchi, Yoav Freund, and Robert E. Schapire: ps. (Rohan will lead discussion)
- 10/04/06: Least-Squares Temporal Difference Learning by Justin Boyan: conference version, longer journal version (pdf). (Lihong will lead discussion)
- 10/11/06: Least-squares policy iteration by Michail G. Lagoudakis and Ronald Parr: pdf. (Lihong will lead discussion)
- 10/18/06: Learning factor graphs in polynomial time and sample complexity by Pieter Abbeel, Daphne Koller, Andrew Y. Ng: short version (pdf), long version (pdf). (Carlos will lead discussion)
- 10/25/06: PEGASUS: A policy search method for large MDPs and POMDPs by Andrew Y. Ng and Michael Jordan.: pdf. (Tom will lead discussion)
- 11/01/06: Coarse sample complexity bounds for active learning by Sanjoy
Dasgupta: ps. (Pavel will lead discussion)
- 11/08/06: Model-based Hierarchical RL by Carlos Diuk.
- 11/15/06: Approximate Distance Oracles by Mikkel Thorup and Uri Zwick: paper. (Mangesh will lead discussion)
- 12/13/06: Reinforcement learning with Gaussian processes by Yaakov Engel, Shie Mannor, and Ron Meir: pdf. (Alex will lead discussion)
Preliminary Paper List
- Least-squares policy iteration by Michail G. Lagoudakis and
Ronald Parr: pdf.
- Learning near-optimal policies with Bellman-residual
minimization based fitted policy iteration and a single sample path
by András Antos, Csaba Szepesvári, and Rémi Munos: pdf.
- Approximate planning in large POMDPs via reusable
trajectories by Michael Kearns, Yishay Mansour and Andrew Y. Ng: pdf.
- On the sample complexity of reinforcement learning. Chapters 4 - 7 of Sham Kakade's thesis: pdf.
- The weighted majority algorithm by Nick Littlestone, Manfred K. Warmuth: paper.
- Agnostic Active Learning by Nina Balcan, Alina
Beygelzimer, and John Langford: pdf.
RL Links