Autumn: Causal Discovery Through Program Synthesis

February 1st, 2023
We're introducing AutumnSynth, an algorithm that synthesizes the source code of simple 2D video games from a small amount of observed video data. This represents a step forward toward systems that can perform causal theory discovery in real-world environments.
Children playing with toys on a futuristic laboratory

Humans develop theories of how the world works through a process of observation and experimentation. This process is mediated by a number of cognitive tools, such as the ability to infer causal connections from observations, build theories to structure our knowledge, and seek out information to validate or disprove our theories. We use these tools in everyday life and, more recently in our evolutionary history, in the practice of science.

If we could build systems that automate scientific discovery, scientific progress itself could be transformed. Such systems could tirelessly explore novel theories, limited only by computational resources, and subject to fewer of the biases, silos, and constraints that exist in scientific practice today. In fields such as climate science, they could discover climate models that also incorporate economic and social systems, informing more effective decision-making. They could aid in the discovery of new laws in physics, leading to a deeper understanding of phenomena such as superconductivity, advancing the design of new materials. In neuroscience, they could extract general principles of neural computation from functional or connectivity data, bridging the gap between our ability to measure the brain and our theoretical understanding of it. They could even be applied to the problem of science itself, leading to better meta-scientific theories of the principles and good practices of science, accelerating progress even further.

Automating intuitive theory discovery has equally transformative potential. Intuitive theories, such as the fact that objects fall when dropped and that icy streets are slippery, are the informal characterizations of everyday reality that we maintain in our minds and learn from experience. Many of the common-sense failures in even the most advanced AI systems, such as those seen in language models, in robotics, and in self-driving vehicles, can be blamed in large part to an absence of intuitive models of how the world works. It’s untenable to anticipate in advance all the models an AI system might need; hence, the ability to discover intuitive models from observations will be a critical component of artificial intelligence systems with greater common sense.

Theory discovery as program synthesis

Theory discovery requires exploration through some space of possible theories, but what is the structure of this space, and what exactly is a theory? At Basis, we adopt the view that programs are good candidate representations of both formal and intuitive models and theories. In other words, abstract theories such as Einstein’s special relativity, concrete models such as those used to predict CO2 change, and the intuitive models of heat that chefs rely on for culinary success, can be represented as symbolic structures that describe computations, i.e., as programs.

The relationship between theories and programs is not merely an analogy—modern programming languages have several features that make them well-suited to represent theories. They are expressive; they allow us to simulate, and hence model, phenomena in areas as diverse as physics, biology, and economics. There is also mounting evidence that humans use mental simulation, particularly probabilistic mental simulation, to do common sense reasoning about our environments1 and each other2. Programs have inherent causal structure3, which allows us to represent causal relationships in the world and imagine counterfactual “what-if” scenarios. Programs are built compositionally—programming languages possess several features that allow us to build programs of staggering complexity incrementally, starting from simple parts. Even then, programs are often interpretable, both engendering and expressing human understanding.


In light of this, we're developing techniques of program synthesis—algorithms that generate programs—for theory discovery. Can we create an algorithm that takes in some observations and automatically generates the program (i.e. code) that produced them? Such a program would be a model or theory of the data in the sense that it describes the mechanisms or laws which govern how the data came about. The programming languages community, and more recently the ML community, has explored automated program synthesis and produced instances of human-level programming capabilities, e.g. in Sketch4, AlphaCode5, and AlphaTensor6. However, these systems are designed to be tools to improve programmer productivity, whereas our goal is to advance toward novel scientific discovery by producing algorithms that can perform theory discovery broadly.

In collaboration with Columbia, MIT, and Stanford, our latest paper7 presented at POPL 2023 develops a new algorithm, AutumnSynth, that can synthesize programs in a new programming language called Autumn. We show that AutumnSynth can generate the source code for a novel suite of interactive grid-world games. Autumn is a reactive language—it allows us to succinctly express the temporal dynamics of objects and specify changes that occur as consequences of events. Solving the problem of causal theory discovery in the context of Autumn games, which are visually simple yet contain complex causal dependencies, is an important step toward systems that can learn to discover causal theories in complex real-world environments.

Try me out! Click to interact with this Autumn environment. While you are interacting, think—or better still, say out loud—how you think the world works. After a short while, you may feel you have a pretty solid theory. Some of the behaviours may seem so obvious that they are barely worth mentioning, such as the fact that clicking causes a purple block to appear. Others are more complex, such as identifying the variables that are hidden but nonetheless effect the dynamics of the world. The challenge for automated theory discovery is to emulate what you have just done—to learn that there are these distinct states of the world, to find the events that cause a transition from one state to another, and describe the different dynamics under each state.

In contrast to Atari-style games, which are often used as benchmarks for Reinforcement Learning agents, Autumn games often lack any notion of winning or reward, and we are not aiming to develop systems that win the game. Instead, the goal is to figure out how everything in the world works, i.e., to build a model of the game. After all, humans often learn models without a clear external reward for doing so.

How AutumnSynth works

AutumnSynth is able to induce the source code of an Autumn program by observing an episode of gameplay by a human player. This gameplay data includes the visual input (what happens on the screen at each time step), and the player’s inputs (button presses and mouse clicks). It does so by uniting two previously disparate techniques in program synthesis: functional and automata synthesis. The first method, functional synthesis, attempts to attribute each observed change in an object across time steps to other on-screen objects or player actions. For example, in the Mario-style game, the “Mario” object moves around the grid-world and can collect coins by moving onto them. The method will notice that each time the player presses the left arrow, Mario moves one space to the left, and that each time Mario intersects with a coin object, that coin disappears. It will correctly attribute the causes of those state changes and define a functional mapping between them.

Example gameplay in "Mario." In contrast to Atari-style games, which are often used as benchmarks for Reinforcement Learning agents, Autumn games often lack any notion of winning, or reward. The goal is simply to figure out how everything in the world works. This property reflects the mission of our algorithm, and the observation that humans often learn models without a clear external reward for doing so.

However, Autumn games often possess causal dependencies that functional synthesis cannot induce. For example, possessing a coin allows Mario to shoot a bullet, which can kill an enemy. But the number of coins currently in Mario’s purse is not displayed anywhere on the screen. Functional synthesis is now faced with an unsolvable problem: there are some frames in which pressing to shoot a bullet will succeed, and others in which it will fail, with no sign to distinguish them. The second method, automata synthesis, endows the algorithm with the ability to invent and track invisible variables, called latent variables, to explain such observations. In this case, the algorithm will learn to track the number of coins Mario possesses as part of the overall game state. AutumnSynth implements these two synthesis techniques as independent modules. Combining state learning (automata synthesis) with nonstate learning (functional synthesis) is a novel approach, despite the two methods being well-developed in the literature.

The input to AutumnSynth is a sequence of observed grid frames and associated user actions (clicks and arrow keypresses), and the output is a program in the Autumn language that generates the observed grid frames given the user actions. Step 1: AutumnSynth first attempts to synthesize a fully functional program that generates the input sequence. This involves parsing a set of object variables in each frame (Perception); tracking how those objects change over time (Tracking); finding an Autumn expression called an update function describing how each object changes at each time (Update Function Synthesis); and synthesizing an event, or predicate over the observed state, that triggers each update function over time (Event Synthesis). Step 2: if event synthesis fails, AutumnSynth augments the observed state by inventing an unobserved or latent variable via automata synthesis, which then allows the desired event to be expressed.

From here to human-level causal discovery

We evaluated AutumnSynth on the 30 CISC programs, as well as the Exploration, Modeling, and Planning Agent (EMPA) benchmark suite of Atari-style grid-world games8. It was able to learn models for 27 of the 30 CISC programs within a 24-hour time limit, as well as 21 of the 27 EMPA games. Program synthesis is a notoriously hard problem since the number of programs grows astronomically with program-size, and hence, synthesizing relatively large programs from a small amount of data in a reasonable time represents a significant advance. Its ability to learn models for EMPA games is also notable, as they are not written in the Autumn language and often involve large numbers of latent variables, as well as non-deterministic events.

This work provides a framework for combining functional and automata synthesis for causal theory induction across a variety of domains. It also suggests many dimensions for expansion toward our ultimate goal of building algorithms that can perform human-level scientific theory discovery. One important direction is active learning. Instead of being given a complete set of training data to study at one time, humans learn through an incremental process in which they take an active role. They decide how to interact with the world and which questions to ask in order to best update their internal model. Creating an algorithm that can perform this type of process would be a significant step toward understanding human intelligence, as well as toward more capable AI systems. Another direction is to treat program synthesis as a problem of probabilistic inference, and build upon Basis' ongoing work developing general systems for probabilistic reasoning. Finally, we are looking at broader applications, moving from fully simulated worlds into the real world, and handling the increased complexity that this entails.


Research: Ria Das, Armando Solar-Lezama, Joshua Tenenbaum, Zenna Tavares

Article: Karen Schroeder, Zenna Tavares

Illustration: Doug John Miller

Acknowledgments: Janine Kwoh and Rafal Urbaniak for comments and feedback on this article.


  1. P. W. Battaglia, J. B. Hamrick, and J. B. Tenenbaum, “Simulation as an engine of physical scene understanding,” Proc. Natl. Acad. Sci., vol. 110, no. 45, pp. 18327–18332, Nov. 2013, doi: 10.1073/pnas.1306572110. ↩︎

  2. C. L. Baker, J. Jara-Ettinger, R. Saxe, and J. B. Tenenbaum, “Rational quantitative attribution of beliefs, desires and percepts in human mentalizing,” Nat. Hum. Behav., vol. 1, no. 4, Art. no. 4, Mar. 2017, doi: 10.1038/s41562-017-0064. ↩︎

  3. Z. Tavares, J. Koppel, X. Zhang, R. Das, and A. Solar-Lezama, “A Language for Counterfactual Generative Models,” in Proceedings of the 38th International Conference on Machine Learning, Jul. 2021, pp. 10173–10182. Accessed: Jan. 27, 2023. Available: ↩︎

  4. A. Solar-Lezama, L. Tancau, R. Bodik, S. Seshia, and V. Saraswat, “Combinatorial sketching for finite programs,” in Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, New York, NY, USA, Oct. 2006, pp. 404–415. doi: 10.1145/1168857.1168907. ↩︎

  5. Y. Li et al., “Competition-level code generation with AlphaCode,” Science, vol. 378, no. 6624, pp. 1092–1097, Dec. 2022, doi: 10.1126/science.abq1158. ↩︎

  6. A. Fawzi et al., “Discovering faster matrix multiplication algorithms with reinforcement learning,” Nature, vol. 610, no. 7930, Art. no. 7930, Oct. 2022, doi: 10.1038/s41586-022-05172-4. ↩︎

  7. R. Das, J. B. Tenenbaum, A. Solar-Lezama, and Z. Tavares, “Combining Functional and Automata Synthesis to Discover Causal Reactive Programs,” Proc. ACM Program. Lang., vol. 7, no. POPL, p. 56:1628-56:1658, Jan. 2023, doi: 10.1145/3571249. ↩︎

  8. P. A. Tsividis et al., “Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning.” arXiv, Jul. 26, 2021. doi: 10.48550/arXiv.2107.12544. ↩︎