Many offline reinforcement learning (RL) problems in the real world, such as satellite control, encounter continuous-time environments characterized by irregular observation intervals and unknown delays affecting state transitions. These environments present significant challenges since current actions influence future states after an unpredictable delay. While existing offline RL algorithms perform well in environments with either irregularly timed observations or known delays, they fall short when both conditions are present. To address this issue, we introduce Neural Laplace Control, a continuous-time, model-based offline RL technique. This innovative approach combines a Neural Laplace dynamics model and a Model Predictive Control (MPC) planner, efficiently learning from datasets with irregular time intervals and inherent, constant unknown delays. Through experimental application in continuous-time delayed environments, Neural Laplace Control has demonstrated its ability to achieve performance levels near those of expert policies.
Samuel Holt, Alihan Hüyük, Zhaozhi Qian, Hao Sun, Mihaela Van Der Schaar