This work introduces Data-Driven Discovery (D3), a multi-agent framework that harnesses Large Language Models (LLMs) to iteratively propose, evaluate, and refine interpretable dynamical-system models—particularly in pharmacology but also applicable to fields like epidemiology. A central feature is its Value of Information (VoI) mechanism, which guides which new features or measurements to acquire for optimal model improvement, even when data for those features is not yet available. Crucially, D3 orchestrates three specialized LLM-driven agents—Modeling, Feature Acquisition, and Evaluation—in a closed loop, leveraging unstructured domain insights, selective data collection, and automated code generation. The resulting pipeline achieves robust modeling accuracy (often surpassing purely symbolic or purely black-box methods) while maintaining interpretability and efficient data usage, providing a compelling template for how LLM agents can collaborate on complex scientific workflows.
Samuel Holt, Zhaozhi Qian, Tennison Liu, Mihaela Van Der Schaar