DiSciPLE

Learning Interpretable Programs for Scientific
Visual Discovery

Utkarsh Mall1, Cheng Perng Phoo2, Mia Chiquier1, Bharath Hariharan2, Kavita Bala2, Carl Vondrick1

1Columbia University        2Cornell University

In CVPR 2025

Abstract

Visual data is used in numerous different scientific workflows ranging from remote sensing to ecology. As the amount of observation data increases, the challenge is not just to make accurate predictions but also to understand the underlying mechanisms for those predictions. Good interpretation is important in scientific workflows, as it allows for better decision-making by providing insights into the data. This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks. We propose DiSciPLE (Discovering Scientific Programs using LLMs and Evolution) an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data. Additionally, we propose two improvements: a program critic and a program simplifier to improve our method further to synthesize good programs. On three different real-world problems, DiSciPLE learns state-of-the-art programs on novel tasks with no prior literature. For example, we can learn programs with 35% lower error than the closest non-interpretable baseline for population density estimation.

Paper

paper     arXiv arXiv     supplementary

Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier, Bharath Hariharan, Kavita Bala, Carl Vondrick. "DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery". Under Submission, 2025.
Bibtex

@article{disciple-25,
 title={DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery},
 author={Mall, Utkarsh and Phoo, Cheng Perng and Chiquier, Mia and Hariharan, Bharath and Bala, Kavita and Vondrick, Carl},
 booktitle={Under Submission},
 year={2025}
}

Results

Example programs found for tasks


Qualitative Comparison on Poulation Density Estimations


Qualitative comparison of DiSciPLE with other baselines on the tasks of population density. The maps display population density as the base-10 log of people per square mile

Code and Data

Coming Soon!

Acknowledgments

This research is based upon work supported in part by the Office of the Director of National Intelligence (Intelligence Advanced Research Projects Activity) via 2021-20111000006, the NSF STC for Learning the Earth with Artificial Intelligence and Physics, the U.S. DARPA ECOLE Program No.\ \#HR00112390060. and NSF grants 2403016 and 2403015. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, DARPA, or the US Government. The US Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.

We also thank Pierre Gentine, Juan Nathaniel, Rong-Yu Gu, and Aya Lahlou for their valuable assistance with the expert experiments and for providing the climate science data. This work would not have been possible without the help of these domain experts.

Special thanks to Purva Tendulkar for their help with the last minute CVPR paper registration.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.