Curious Agents Saga: Part 3, Beyond Surprise: Direct and Causal Exploration in Deep Reinforcement Learning

1 minute read

Published: June 27, 2024

Beyond Surprise: Direct and Causal Exploration in Deep Reinforcement Learning

Table of Content

Reflection on Intrinsic Motivation
Direct Exploration
- Replay Memory Focused on Exploration
- Performance-based Replay Memory
Causal Exploration
- What is Causality?
- Dependency Test
- Potential Outcome
- Structural Causal Model

Reflection on Intrinsic Motivation

In the previous article, we reviewed an essential exploration framework called intrinsic motivation, which is widely used in deep RL due to its scalability. Within the framework, surprise and novelty are the medium for exploration. Regarding surprise, memory is often hidden within dynamics models, memorizing observed data to enhance predictive capabilities. This type of memory tends to be long-term, semantic, and slow to update, akin to a careful archivist meticulously preserving information.

On the other hand, novelty takes a more straightforward approach to memory. Memory is delineated here, resembling a slot-based matrix, a nearest neighbor estimator, or a simple counter. This memory is typically short-term, instance-based, and highly adaptive to environmental changes, acting more like a dynamic and responsive agent ready to adjust to new inputs swiftly.

They all begin with the memory origin, employing surprise or novelty mechanisms to create intrinsic rewards that guide the exploration of the RL agent. While being so convenient and easy to use, two major issues have hindered the ability of the framework to explore effectively:

Detachment: lose track of interesting areas to explore.

Derailment: prevent it from utilizing previously visited states.

🧠 But what other alternatives could there be to overcome these issues?

Read the full article

Check our papers:

Variable-Agnostic Causal Exploration for Reinforcement Learnin

Share on

Twitter Facebook LinkedIn

Reason on the Fly: How RL Boosts LLM Reasoning On the Spot

less than 1 minute read

Published: June 04, 2025

In our last post, we warmed up with why reinforcement learning (RL) is a powerful paradigm for building smarter AI reasoners. Today, we zoom in on an exciting approach: using RL at inference time to improve large language model (LLM) reasoning on the spot. In particular, we explore ways to inject real-time reasoning into static LLMs. Let’s break down how RL can transform a frozen LLM into a more dynamic, reasoned thinker at runtime. Read more

Think Before You Speak: Reinforcement Learning for LLM Reasoning

1 minute read

Published: May 21, 2025

Large Language Models (LLMs) have shown remarkable capabilities across a range of natural language tasks. Yet, when you give them a problem that needs a bit of careful thinking, like a tricky math question or understanding a complicated document, suddenly they can stumble. It’s like they can talk the talk, but when it comes to really putting things together step-by-step, they can get lost. Read more

The Best of Time-Series Forecasting (Part II): Advancements in Time Series Modeling Through Large Language Models

less than 1 minute read

Published: April 09, 2025

Part 1 of my blog looked at how time-series forecasting has evolved—from traditional models like ARIMA to deep learning methods like Transformers. These approaches brought big improvements, especially in handling complex and long-range patterns. However, they also have limits, especially when it comes to adapting to new data or working well across very different domains. Read more