PPO | Samuel Holt

The problem of symbolic regression involves discovering concise, closed-form mathematical equations from data, a challenge due to its nature as a large-dimension combinatorial search problem. We propose a novel transformer architecture capable of encoding an entire dataset. This architecture can be trained end-to-end using the Root Mean Square Error (RMSE) of the fit of the generated equation, employing model-free reinforcement learning, specifically Proximal Policy Optimization (PPO) augmented with genetic programming to enhance sample diversity. Our method undergoes pre-training and is gradient fine-tuned at inference time to adapt to the dataset of interest. We designate this generative model as the Deep Generative Symbolic Regression (DGSR) framework. Through experiments, we demonstrate that DGSR not only achieves a higher recovery rate of true equations with a larger number of input variables but also offers greater computational efficiency at inference time compared to state-of-the-art reinforcement learning symbolic regression solutions.