Linking Algorithms to Neural Mechanisms in Predictive Memory Models

March 22nd, 2023
In a new paper, we demonstrate biologically-plausible neural network models that can compute important features of predictive learning and memory systems. Our results suggest that these features are more accessible in neural circuits than previously thought, and can support a broad range of cognitive functions. The work achieves something that has proved difficult in AI research: bridging a well-defined computational function with its neural mechanism.

A key challenge in understanding intelligent systems is bridging levels of abstraction, from high-level cognitive goals, to concrete neural mechanisms. Research at the algorithmic level, in neuroscience and AI, has explored how to make predictions about the future, how to make good decisions, and other related questions. Meanwhile, research at the mechanistic level has asked how the brain learns via synaptic plasticity, what learning rules it uses, and how individual neurons may represent specific quantities or concepts. The connection between mechanism and function, despite being widely appreciated as important, has proven difficult to explore, with few exceptions.

One example of this disconnect is the case of predictive memory systems for decision making. There are multiple systems in the brain for decision making, but one that is particularly powerful is a predictive system that is mediated by a brain region called the hippocampus1. This system builds a representation of the world that allows us to predict the possible future outcomes of our actions, and thereby make better decisions. This kind of system and its neural mechanisms are of great interest to us at Basis, as we work to develop algorithms that can learn and reason about the world.

How does the brain build this kind of world model? One important component, identified and explored by neuroscience researchers, is the Successor Representation (SR)2. The SR is a representation of the state transition dynamics of an environment. That is, it’s a mathematical object that tells you, based on your current state and past experiences, where you’re likely to end up in the future. The SR itself can be thought of as a predictive map of the world, and it can also be combined with other representations (i.e. value estimates) to facilitate learning and decision making in many situations. It can tell you where in physical space you’re likely to end up while navigating the hallways of your office building, but also what the late game of your chess game is likely to be given your current board state. Because of this flexibility, the SR has become a common component of machine learning algorithms attempting to endow artificial agents with the ability to learn3,4.

Despite the SR’s clear value in adaptive behavior, and recent evidence of its appearance in hippocampal brain data5, it has remained unclear how it could be computed by biological brains. On a mechanistic level, how do the many synapses in networks of neurons organize themselves to store and use these kinds of representations? If we could understand how synapses change to form predictive memories, we could discover what kinds of biological processes in the brain are important for supporting memory, prediction and other important cognitive functions.

We’re excited to share a new paper, by Ching Fang et al.6, out of the Aronov and Abbott labs at Columbia’s Zuckerman Institute, with Basis co-founder Emily Mackevicius as the senior author. In this paper, we demonstrate a way that neurons can learn these predictive signals under biological constraints. We designed and trained neural network models (RNNs) with biologically plausible learning rules that can exactly compute the SR, and show that activity in the networks matches key features of hippocampal activity recorded from foraging birds. In doing so, we’ve connected memory at two levels of explanation: from synaptic mechanisms to a cognitive description of an animal thinking forward in time. Such models allow us to test theories about memory function in biological brains, and also inspire AI algorithms for learning in artificial agents. This paper was published alongside two complementary7 papers8 that demonstrate learning of the SR in networks with different architectures.

A neural network that calculates the Successor Representation

It’s known that a predictive world model is an important aspect of hippocampal memory, but how do we formally define what it means to be predictive? In this work, our goal was to implement a precisely defined predictive model in RNNs, to ask how a real brain could achieve this exact function. We followed the definition in Stachenfeld et al.5, which considers the SR itself as a predictive map. Given an agent in an environment that is discretized into many possible states (for example, a mouse in a maze of states s1…sN, Figure 1), the SR is an infinite-horizon prediction of where the agent will end up. It can be broken down into two components: a state transition matrix, which defines how states are connected to each other, and a temporal discount factor that defines how important it is to consider states farther into the future.

Figure 1. The Successor Representation predicts the distribution over future possible states, s, of an agent in an environment, given the current state. It’s learned from the agent’s previous experience with the environment. In this example, the states are discrete locations in a maze. Reproduced from [6] (cc-by 4.0).

We trained networks that compute and output the SR, given the current location of the agent as an input. Critically, we constrained ourselves by only working with networks that have biological components, and deriving a learning rule that requires the neurons in the network to learn this representation using only information that the equivalent biological neurons would have available to them. We also show that by using an adaptive learning rate, these networks can learn the SR for a given environment quickly and accurately, as animals are known to do. As the required biological mechanisms are quite simple, this process appears completely accessible to biological hippocampal neurons. Once the network computes the SR, it can then combine it with information about how rewarding individual states are, in order to compute the value of moving to any particular state.

In addition to performing the desired computation, our model neurons resemble biological neurons in their response patterns during behavior. When given behavioral data from tufted titmice foraging in an open field as training material, the model neurons develop place fields (characteristic firing patterns in response to particular spatial locations) matching those of the birds’ hippocampal neurons (Figure 2).

Figure 2. Neurons in the hippocampus of foraging birds (left) exhibit place fields, particular patterns of responses to certain locations in the environment (center, in this case, a square open field). When trained to compute the SR based on real birds’ behavioral trajectories, the neurons in our models developed similar place fields (right). Reproduced from [6] (cc-by 4.0).

One particularly useful feature of our model implementation is its flexibility with respect to the SR temporal discount factor. The discount factor becomes important when using the SR to decide between alternative actions: imagine a scenario where you are trying to choose between two actions A and B. You know that A and B will each lead you through a sequence of ten states, with one of them being highly valuable (you receive a piece of gold there). You also know that choosing A leads you to the gold after two time steps, while B leads you there after ten time steps. Without a discount factor, these would be equivalent under the SR. The discount factor allows you to attribute higher value to states that are closer to you in time, and correctly decide that you prefer action A over B.

In our model, this discount factor is not encoded in the weights of the recurrent network, but instead as a gain parameter applied across the whole network. This confers the ability to quickly modulate the factor in order to consider values of the SR for different time horizons. It may be advantageous to use a shorter or longer time horizon during the encoding versus retrieval phases of memory access, for example. During the retrieval phase, after learning has completed, one could query the network over multiple parameter values to determine the best course of action. One could also infer from the behavior of an individual what time horizon it considered when executing a behavioral trajectory.

Another point of interest is that our network computes the state transition matrix on the way to outputting the SR. Unlike the SR, the transition matrix is defined at each time point, which means that it can provide information on a much finer timescale. It could be used - by researchers or by the network itself - to reason about potential trajectories and counterfactuals. Where will the agent be in one or a few time points from now, and where would it have been had it taken a different action? This could provide the basis for a much richer and more flexible model of how to interact with the external world.

Toward more powerful and flexible learning systems

In this paper, we’ve derived biologically-plausible learning rules that allow RNNs to learn the SR, a useful feature for building predictive maps. This work provides a foundation for the continued study of complex cognitive processes and how we might implement them in model systems, with links to Basis’ goal of developing systems that learn and reason by building models of their environments.

We are currently building upon this work to understand how predictive features are integrated with information from other brain systems, such as those for perception and motor planning, to drive behavior. Our models have recapitulated important aspects of neural activity during foraging behavior in the lab, but how is information collected and used in more complex, real-world environments? Typically, animals forage in environments that are new or rapidly changing, and interact with other individuals of various species. An animal’s behavioral policy will depend on sensed and inferred information about the environment, other agents, and its own state. How is all of this information integrated into an individual’s predictive model and its policy? We would like to generate a more complete picture of this process.

This line of work also links to our work9 building Autumn and AutumnSynth, a language and algorithm for causal theory discovery. Autumn allows for a very general expression of the structure of the world and the time-varying causal relationships within it. AutumnSynth can also discover latent explanatory variables – e.g. agent proximity to food – underlying observed data, making it ideal for studying decision making policies by biological or simulated agents. Based on Autumn, we are developing systems to infer information about animals’ internal models of the world, just by observing their behavior. This is a challenging task, and highly relevant for understanding how members of collaborative groups exchange information.

Finally, we’re interested in how the predictive map-based system discussed in our paper interacts with other types of learning systems. In neuroscience, there is evidence that humans and other animals possess both model-based and model-free systems, which complement each other. Model-based algorithms, while powerful, are computationally intensive and don’t scale well to all problems. Model-free algorithms are computationally efficient and can rapidly associate values with actions, but are limited in their ability to support inference. How do these learning systems come together to do better than the sum of their parts? In AI systems, adding predictive elements into value-based (model-free) ML algorithms can significantly improve performance10,11. But more work is needed to understand how to best unify these systems to produce sought-after features of intelligence such as online learning, transfer learning and generalization. Continued exploration of learning systems in biological brains should lead us toward more powerful and efficient algorithms of learning and decision making with broad applicability in AI.


Research: Ching Fang, Dmitriy Aronov, Larry Abbott, Emily Mackevicius

Article: Karen Schroeder


  1. Keefe, J. O., & Nadel, L. (1978) The hippocampus as a cognitive map. Clarendon Press. ↩︎

  2. Dayan P (1993) Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5:613–624. ↩︎

  3. Barreto, A., Dabney, W., Munos, R., Hunt, J. J., Schaul, T., van Hasselt, H. P., & Silver, D. (2017) Successor features for transfer in reinforcement learning. Advances in neural information processing systems, 30. ↩︎

  4. Kulkarni, T. D., Saeedi, A., Gautam, S., & Gershman, S. J. (2016). Deep successor reinforcement learning. arXiv preprint arXiv:1606.02396. ↩︎

  5. Stachenfeld KL, Botvinick MM, Gershman SJ. (2017) The hippocampus as a predictive map. Nature Neuroscience 20:1643–1653. ↩︎

  6. Ching Fang, Dmitriy Aronov, LF Abbott, Emily L Mackevicius (2023) Neural learning rules for generating flexible predictions and computing the successor representation. eLife 12:e80680 ↩︎

  7. Tom M George, William de Cothi, Kimberly L Stachenfeld, Caswell Barry. (2023) Rapid learning of predictive maps with STDP and theta phase precession. eLife 12:e80663. ↩︎

  8. Jacopo Bono, Sara Zannone, Victor Pedrosa, Claudia Clopath (2023) Learning predictive cognitive maps with spiking neurons during behavior and replays. eLife 12:e80671 ↩︎

  9. Das, R., Tenenbaum, J. B., Solar-Lezama, A., & Tavares, Z. (2023) Combining Functional and Automata Synthesis to Discover Causal Reactive Programs. Proceedings of the ACM on Programming Languages, 7(POPL), 1628-1658. ↩︎

  10. Schrittwieser, J., Antonoglou, I., Hubert, T. et al. (2020) Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 604–609. ↩︎

  11. Barreto, A., Hou, S., Borsa, D., Silver, D., & Precup, D. (2020). Fast reinforcement learning with generalized policy updates. Proceedings of the National Academy of Sciences, 117(48), 30079-30087. ↩︎