Memory-Based Reinforcement Learning



AJCAI 2022 Tutorial: Memory-Based Reinforcement Learning

Hung Le, Deakin University

logo Presenter logo

Time and Location

1.30pm4.30pm (UTC/GMT+8, AWST) on Monday, 05 December 2022
Location: Hyatt Regency Perth, Perth, WA, Australia.

Virtual: Microsoft Teams


Reinforcement learning (RL) is a branch of artificial intelligence wherein autonomous agents learn to maximise predefined rewards from the environment. Despite immense successes in breaking human records, the current training of RL agents is prohibitively expensive in terms of time, computing resources, and samples. For example, it requires trillions of playing sessions to reach human-level performance on simple video games. The problem of sample inefficiency is exacerbated in stochastic, partially observable, noisy or long-term real-world environments, whereas humans can show excellent performance under these circumstances without much training. That shortcoming of RL agents can be attributed to the lack of efficient human-like memory mechanisms that hasten learning by smartly utilising past observations.

This tutorial presents recent advances in memory-based reinforcement learning where emerging memory systems enable sample-efficient, adaptive and human-like RL agents. The first part of the tutorial covers the basics of RL and raises the sample inefficiency issue. The second part presents a taxonomy of memory mechanisms that recent lean RL employs to reduce the number of training samples and resemble human memory. The subsequent three sections study the benefits that memory can provide to RL agents, which can be categorised as (1) Quick access to critical experiences; (2) A better representation of observation contexts; (3) Intrinsic motivation to explore; and (4) Optimisation. Finally, the tutorial concludes with discussions on opening challenges and promising future research on memory-based RL.

Tutorial Outline

13:30 – 13:40Introduction and Background
13:40 – 13:50Taxonomy of Memory in RL
13:50 – 14:10Memory as Experiences
14:10 – 14:30Memory for Better Context
14:30 – 14:50QA and Break
14:50 – 15:10Memory in Exploration
15:10 – 15:30Memory for Optimisation
15:30 – 15:50Demo
15:50 – 16:30Conclusion and QA
About Presenter

Hung Le is a research lecturer at Deakin University, Australia. He is a member of Applied Artificial Intelligence Institute (A2I2) where he works on various topics in machine learning, deep learning and artificial memory. In particular, Hung is keen to invent new deep models with access to artificial neural memory and has created a body of work in advancing this area including multi-modal and generative memory, theoretical foundation for memory operations, general-purpose neural computers and memory-based reinforcement learning agents. Applications include health, dialogue system, reinforcement learning, machine reasoning and natural language processing. He publishes regularly in top ML/RL/AI venues such as ICLR, NeurIPS, ICML, AAAI, KDD, NAACL, ECCV, AAMAS, ICPR, ICONIP and PAKDD. He obtained a Bachelor of Engineering (Honors) from Hanoi University of Science and Technology, and a PhD in Computer Science from Deakin University in 2015 and 2020, respectively.

Tutorial Materials and Recordings

Benchmark datasets

- Toy environments: Classic Control
- Discrete-action: Atari games
- Continous-action: Mujoco

Demo code

Tutorial code


Link to slides