Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method and Contrastive Learning
2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
doi: 10.1109/IROS55552.2023.10342018
- Oct 2023
Model-based reinforcement learning (MBRL) with
real-time planning has shown great potential in locomotion
and manipulation control tasks. However, the existing planning
methods, such as the Cross-Entropy Method (CEM), do not
scale well to complex high-dimensional environments. One of
the key reasons for underperformance is the lack of exploration,
as these planning methods only aim to maximize the cumulative
extrinsic reward over the planning horizon. Furthermore,
planning inside the compact latent space in the absence of observations makes it challenging to use curiosity-based intrinsic
motivation. We propose Curiosity CEM (CCEM), an improved
version of the CEM algorithm for encouraging exploration via
curiosity. Our proposed method maximizes the sum of state-action Q values over the planning horizon, in which these Q
values estimate the future extrinsic and intrinsic reward, hence
encouraging to reach novel observations. In addition, our model
uses contrastive representation learning to efficiently learn
latent representations. Experiments on image-based continuous
control tasks from the DeepMind Control suite show that
CCEM is by a large margin more sample-efficient than previous
MBRL algorithms and compares favorably with the best model-free RL methods.
@InProceedings{KWW23a, author = {Kotb, Mostafa and Weber, Cornelius and Wermter, Stefan}, title = {Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method and Contrastive Learning}, booktitle = {2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, journal = {}, editors = {}, number = {}, volume = {}, pages = {}, year = {2023}, month = {Oct}, publisher = {}, doi = {10.1109/IROS55552.2023.10342018}, }