Long-Horizon Planning with Predictable Skills

Nico Gürtler, Georg Martius

University of Tübingen & Max Planck Institute for Intelligent Systems, Tübingen, Germany
Published at RLC 2025

Abstract

Model-based reinforcement learning (RL) leverages learned world models to plan ahead or train in imagination. Recently, this approach has significantly improved sample efficiency and performance across various challenging domains ranging from playing games to controlling robots. However, there are fundamental limits to how accurate the long-term predictions of a world model can be, for example due to unstable environment dynamics or partial observability. These issues are further exacerbated by the compounding error problem. Model-based RL is therefore generally limited to short rollouts with the world model, and consequently struggles with long-term credit assignment. We argue that this limitation can be addressed by modeling the outcome of temporally extended skills instead of the effect of primitive actions. To this end, we propose a mutual-information-based skill learning objective that ensures predictable, diverse, and task-related behavior. The resulting skills compensate for perturbations and drifts, enabling stable long-horizon planning. We thus introduce Stable Planning with Temporally Extended Skills (SPlaTES), a sample-efficient hierarchical agent consisting of model predictive control with an abstract skill world model on the higher level, and skill execution on the lower level.

Rollouts

Fetch Pick & Place

TD-MPC 2

TD-MPC 2 with HER

DADS

SPlaTES (ours)

Fetch Pick & Place Barrier

TD-MPC 2

TD-MPC 2 with HER

DADS

SPlaTES (ours)

Ant Maze Medium

TD-MPC 2

TD-MPC 2 with HER

DADS

SPlaTES (ours)

Ant Maze Push

TD-MPC 2

TD-MPC 2 with HER

DADS

SPlaTES (ours)

Distillation on Ant Maze Large

SPlaTES (ours)

SPlaTES (ours)
distilled into TD-MPC2 agent

@inproceedings{ gurtler2025longhorizon, title={Long-Horizon Planning with Predictable Skills}, author={Nico G{\"u}rtler and Georg Martius}, booktitle={Reinforcement Learning Conference}, year={2025}, url={https://openreview.net/forum?id=G8ybRSxO10} }

Long-Horizon Planning with Predictable Skills

Abstract

Algorithm overview video