Hi, I am currently an RS intern at Google DeepMind. I am also a fourth-year Ph.D. student in Machine Learning at the University of Cambridge advised by Mihaela van der Schaar in the Machine Learning and Artificial Intelligence group. To date, I have published nine papers in top-tier ML conferences (NeurIPS [spotlight], ICML [long oral], ICLR [spotlight] and AISTATS).
PhD in Machine Learning, 2021 - 2025
University of Cambridge
MEng in Engineering Science, 2013 - 2017
University of Oxford
Hybrid Digital Twins (HDTwins) offer a novel approach to modeling dynamical systems by combining mechanistic and neural components, effectively leveraging domain knowledge while enhancing flexibility. However, existing hybrid models typically rely on manually defined architectures, limiting adaptability and generalization—particularly in data-scarce or unseen scenarios. To address this, we introduce HDTwinGen, an evolutionary algorithm that utilizes Large Language Models (LLMs) to autonomously generate, optimize, and refine hybrid digital twin architectures. Through iterative LLM-driven proposals and parameter optimization, HDTwinGen explores a vast design space, enabling the evolution of increasingly robust and generalizable HDTwins. Empirical results show that HDTwinGen surpasses conventional methods, yielding models that are not only sample-efficient but also adept at adapting to novel conditions, advancing the state of Digital Twin technology in dynamic real-world applications.
Offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. Typically, preference optimization is approached as an offline supervised learning task using manually-crafted convex loss functions. While these methods are based on theoretical insights, they are inherently constrained by human creativity, so the large search space of possible loss functions remains under explored. We address this by performing LLM-driven objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Specifically, we iteratively prompt an LLM to propose and implement new preference optimization loss functions based on previously-evaluated performance metrics. This process leads to the discovery of previously-unknown and performant preference optimization algorithms. The best performing of these we call Discovered Preference Optimization (DiscoPOP), a novel algorithm that adaptively blends logistic and exponential losses. Experiments demonstrate the state-of-the-art performance of DiscoPOP and its successful transfer to held-out tasks.
L2MAC is the first practical LLM-based stored-program automatic computer framework, suitable for generating unbounded, long, and consistent outputs. This framework, instantiated and demonstrated for large code base coding tasks, leverages an external memory comprised of a file store and instruction registry, as well as a control unit managing the context of the LLM. Consequently, it overcomes the limitation imposed by the fixed context window constraint inherent in transformer-based LLM architectures, outperforming other methods in generating large code bases for complex system design tasks.
This paper presents a new approach to inferring unbiased treatment effects, using human-readable ordinary differential equations (ODEs) instead of traditional neural networks. This method enhances interpretability and accommodates irregular sampling, while introducing fresh identification assumptions. The innovation lies in transforming any ODE discovery into a treatment effects methodology, potentially revolutionizing the field.
For the first time, we introduce and formalize the problem of continuous-time control with costly observations, theoretically demonstrating that irregular observation policies outperform regular ones in certain environments. We empirically validate this finding using a novel initial method: applying a heuristic threshold to the variance of reward rollouts in an offline continuous-time model-based Model Predictive Control (MPC) planner across various continuous-time environments, including a cancer simulation. This work lays the foundation for future research on this critical problem.
The problem of symbolic regression involves discovering concise, closed-form mathematical equations from data, a challenge due to its nature as a large-dimension combinatorial search problem. We propose a novel transformer architecture capable of encoding an entire dataset. This architecture can be trained end-to-end using the Root Mean Square Error (RMSE) of the fit of the generated equation, employing model-free reinforcement learning, specifically Proximal Policy Optimization (PPO) augmented with genetic programming to enhance sample diversity. Our method undergoes pre-training and is gradient fine-tuned at inference time to adapt to the dataset of interest. We designate this generative model as the Deep Generative Symbolic Regression (DGSR) framework. Through experiments, we demonstrate that DGSR not only achieves a higher recovery rate of true equations with a larger number of input variables but also offers greater computational efficiency at inference time compared to state-of-the-art reinforcement learning symbolic regression solutions.
Many offline reinforcement learning (RL) problems in the real world, such as satellite control, encounter continuous-time environments characterized by irregular observation intervals and unknown delays affecting state transitions. These environments present significant challenges since current actions influence future states after an unpredictable delay. While existing offline RL algorithms perform well in environments with either irregularly timed observations or known delays, they fall short when both conditions are present. To address this issue, we introduce Neural Laplace Control, a continuous-time, model-based offline RL technique. This innovative approach combines a Neural Laplace dynamics model and a Model Predictive Control (MPC) planner, efficiently learning from datasets with irregular time intervals and inherent, constant unknown delays. Through experimental application in continuous-time delayed environments, Neural Laplace Control has demonstrated its ability to achieve performance levels near those of expert policies.
Neural Ordinary Differential Equations (ODEs) struggle to model systems with long-range dependencies or discontinuities common in engineering and biological contexts. Despite alternative approaches, numerical instability persists when handling stiff ODEs and those with piecewise forcing functions. This work introduces Neural Laplace, a framework adept at learning various classes of differential equations, efficiently representing history-dependencies and discontinuities in the Laplace domain through complex exponentials. By leveraging the geometric stereographic map of a Riemann sphere, Neural Laplace ensures smoother learning in this domain. Experimental results indicate its superior performance in modeling and extrapolating trajectories of diverse differential equations, even those with complex history dependencies and abrupt changes.
Sole Author and Teacher for Machine Learning and Deep Learning Course.
Python, Javascript, Typescript, RUST, MATLAB, Bash, SQL, C, C++
Jax, TensorFlow, PyTorch, Keras, NumPy, SciPy, Pandas, Asyncio, Nltk, Jupyter, PyTest
git, Linux, LaTeX, Google Cloud Platform, Amazon Web Services, Docker, GitLab CI