Home New User Sign-Up! About "Dynamic Brain" Articles Academic conferences Collaborative Projects Collaborative Hackathons Models How to get "PhysioDesigner" See more ...
Site Information

# From synapse to behavior - Recent outcomes (2004-2008) -

#### Themes and contents of research

In various field of neuroscience, we have observed various phenomena relevant to brain functions in the level of molecules, the level of synapses and neurons, the level of local circuits, the level of brain areas, and the level of animal's and human's behaviors. We have examined relations among different phenomena observed in the different levels. For example, we have examined effects of molecular manipulation on synaptic plasticity, spiking property of a neuron, or animals' behavior of learning. In the field of electro-physiology, we have examined correlates between spiking activities of neurons and animal's behavior. In the imaging study, we have examined correlates between collective activities of brain areas and animal's behavior.

However, these experimental observations provide only fragmentary information about mechanisms of neural system achieving behavioral functions. It is practically impossible to observe mechanism and causality relating phenomena in different levels, because a number of units are involved in a phenomenon. We require a computational approach in order to intermediate the different levels. In the group of computational theory in Tamagawa university, we attempt to provide simple views of mechanisms relating phenomena observed in different levels, by using computational theory and model-based analyses.

1. Mechanisms of synaptic plasticity

Synaptic plasticity is considered to be fundamental mechanisms of behavioral learning and memory. Studies of synaptic plasticity have been accumulated since the first discovery by Bliss and Lomo (1973). In these studies, synaptic plasticity has been induced by presynaptic stimulation of certain frequency. The accumulated studies provided a simple view of the mechanism: "Calcium density in the synaptic spine controls the synaptic depression and potentiation." Recently, by controlling the timings of postsynaptic spikes as well as presynaptic stimulations, it was found that synaptic depression and potentiation depend on the order of presynaptic and postsynaptic spike timings (Markram et al. 1997; Bi and Poo 1998). This phenomenon is called as STDP (Spike-Timing-Dependent synaptic Plasticity). We expected that the learning rule based on STDP may be a basic mechanism of the synaptic plasticity. However, it has been pointed out that experimental results of STDP are inconsistent with the naive calcium-density principle.

To solve this problem, we proposed yet another principle based on calcium density so as to reproduce STDP experiments (Kurashige and Sakai 2006).

2. Effects of synaptic plasticity on neurons and local circuits

STDP (Spike-Timing-Dependent synaptic Plasticity) naturally leads to a simple learning rule, which depends on the relative timings of presynaptic and postsynaptic spikes. Computational significance of the STDP learning rule has been examined in the world (Song et al. 2000; Song and Abbott 2001; van Rossum et al. 2000; Rubin et al. 2001; Gutig et al. 2003; Toyoizumi et al. 2005).

Our group also has examined this issue, and reported various effects of STDP rules on neuronal selectivities (Sakai et al 2004) and topological map representations (Sakai 2005; Sakai and Wada 2009).

3. Irrational behavior and synaptic learning rules

Animals including humans often encounter such situations in which they must choose their behavioral responses to be made in near or distant future even in the case that the information sources of decision or the outcomes of the responses are ambiguous. To make a decision, the brain must analyze pieces of information given externally, the past experiences in a similar situation, possible behavioral responses, and predicted outcomes of the individual responses. Animals attempt to change their policy of making choice so as to maximize obtainable reward. However, it is known that animals can not always maximize reward. Such an irrational behavior sometimes follows an empirical law. Empirical laws of irrational behaviors can restrict possible leaning rules adopted by the decision system in the animal’s brain.

For example, when a subject is to make a choice from possible options that are rewarded according to a certain probabilistic rule or a schedule, the subject's choices may follow an empirical rule such as the matching behavior (Herrnstein 1961, 1997). Our group showed that several reinforcement learning algorithms exhibit the matching behavior in several tasks (Sakai and Fukai, 2008a), and specified the conditions which are common to the learning algorithms that exhibit the matching behavior, and generalized them as a class of synaptic learning rules (Sakai and Fukai, 2008b). We call this common learning strategy as “matching strategy”. We proved that the matching strategy can maximize the reward when the decision system uses appropriate information source to predict the reward expectation. The matching strategy is irrational only when the decision system fails to select appropriate information source. Hence, the strategy itself is rational, but leads to irrational behavior when selection information source is failed.

#### Main Results I

##### Mechanisms of synaptic plasticity
Fig. 1 Calcium principle and STDP

STDP (Spike-Timing-Dependent synaptic Plasticity) shows the dependence of synaptic changes on the relative timings of presynaptic and postsynaptic spikes (solid line in Figure 1B). A synapse is potentiated when a pre-synaptic spike precedes a post-synaptic spike (pre-before-post), and depressed when a post-synaptic spike precedes a pre-synaptic spike (post-before-pre). The size of change is, in most cases, monotonically reduced to zero as increasing interval between pre-and-post-synaptic spikes up to tens of milliseconds.

It has been considered that STDP might be caused by the "calcium density principle" that large calcium influx (regime P in Fig.1A) causes synaptic potentiation and small calcium influx (regime D in Fig.1A) causes synaptic depression. When a post-synaptic action potential causes after a pre-synaptic glutamate release, the back-propagating action potential causes so large calcium influx through N-methyl-D-aspartate (NMDA) receptors as to induce synaptic potentiation. When a post-synaptic action potential causes before a pre-synaptic glutamate release, the sufficiently hyperpolarized membrane potential causes small calcium influx through NMDA receptors so as to induce synaptic depression.

However, it has been pointed out that such simple application of the calcium density principle leads to several predictions inconsistent with experimental results. With increasing of the interval in pre-before-post order, the amount of calcium influx should decrease monotonically as glutamate binding to NMDA receptors decreases. Therefore, before the level of calcium elevation decreases to the non-plasticity-inducing regime (regime N in Fig.1A), there has to be a range of timing within which the calcium increase is moderate, i.e., depression-inducing (regime D) according to the principle (dashed line in Fig.1B). However, such an additional depression window has not been observed if inhibitory inputs are blocked (Bi and Poo 1998; Zhang et al. 1998; Feldman 2000; Froemke and Dan 2002), while it has been found in hippocampal slices preserving inhibitory networks (Nishiyama et al. 2000; Tsukada et al. 2005).

Fig 2 Results

Considering the point that the size of spike-timing dependent potentiation monotonically decays to zero with increasing of the pre-and-post-spike interval, calcium density should be settled in the threshold θ, rather than in the non-plasticity regime (regime N), for a sufficiently long interval. Since calcium influx is induced by a single pre-synaptic spike or single post-synaptic one, the threshold θ might be determined as a linear summation of calcium elevations induced by single pre-synaptic and post-synaptic spikes respectively. If the threshold θ is dynamically sliding with the change of the linear summation, then synaptic potentiation should be induced in pre-before-post order, and depression should be induced in post-before-pre order, because calcium elevation is known to be supra-linear in pre-before-post order and sub-linear in post-before-pre order.

We constructed a synaptic plasticity model based on the above mentioned principle of dynamically sliding threshold with the change of the linear summation, and demonstrated that the model reproduces the timing dependence of STDP (Fig.2A). Interestingly, it is found that the model also reproduces the initial-strength dependence observed in STDP (Fig. 2B), despite that no explicit initial-strength dependence is not incorporated (Kurashige and Sakai 2006, 2007). This model also qualitatively reproduces the frequency dependence of synaptic plasticity induced by pre-synaptic-only stimulation (Fig. 2C), and the triplet timing dependence (Fig. 2D) of synaptic plasticity induced by triplet (pre x 2 and post x 1, or pre x 1 and post x 2) stimulation (Froemke and Dan 2002).

#### Main Results II

##### Effects of synaptic plasticity on neurons and local circuits

STDP naturally leads to a simple learning rule in which the synaptic change is determined by the relative timings of presynaptic and postsynaptic spikes. As shown previously, the simple STDP learning rule has various computational significances. Cateau and Fukai (2003) provides a theoretical framework to examine neuronal selectivity acquired by a STDP rule, by using Fokker-Plank method. This framework requires the following assumptions.

• A postsynaptic neuron receives synaptic inputs from a number of presynaptic neurons.
• Each presynaptic neuron sends spikes randomly with the variable rate.
• The variable rates are simultaneously correlated, but temporally uncorrelated.
• Synaptic change is sufficiently slow so that synaptic strength can be assumed to be constant in an inter-spike interval.

In this case, we can numerically calculate the distribution of synaptic strength in a steady state after a sufficient learning with the STDP rule. If we adopt the following assumptions, then we can obtain the analytical solution.

• The presynaptic rates are constant.
• The postsynaptic neuron obeys the leaky integrate-and-fire model.
• The STDP timing dependences of the pre-before-post and post-before-pre order are approximated to be exponential functions respectively.

Fig.3 Synaptic distribution acquired by various STDP rules

By using this framework, we can find that the initial-strength dependence in the STDP rule has a significant effect on the steady distribution of synaptic strength. It is known that the initial-strength dependences of synaptic potentiation and depression are asymmetric (Fig.1B). We demonstrated that various types of the initial-strength dependences leads to various types of steady distributions of synaptic strength (Fig.3). A postsynaptic neuron receives synaptic inputs from 1,000 excitatory neurons and 200 inhibitory neurons firing randomly with a constant rate (Fig.3A). Strength of each excitatory synapse changes accroding to STDP rule with certain strength dependence (left-top in A and B). The distribution of synaptic strength after a sufficient learning exhibits a bimodal distribution in a certain combination of the initial-strength dependence (left-middle, left-bottom in B) and a unimodal distribution in a certain combination of the initial-strength dependence (left-middle, left-bottom in C). In both cases, the STDP rule possesses a function to regulate the postsynaptic firing rate for a change of presynaptic firing rate (right in B and C). At the higher presynaptic firing rate (left-bottom in B and C), the balance of synaptic strength is automatically reduced to regulate the postsynaptic firing rate in comparison with the case of lower presynaptic firing rate (left-middle in B and C).

Fig.4 Multiplex topological representation

We also applied the STDP rule for a locally connected recurrent network which receives feed-forward connections from input neurons. This type of networks have been often used to reproduce a topological map, observed in sensory systems. It was previously shown that the STDP-based topological map model can acquire the topological map representation (Song and Abbott 2000). Our group (Sakai 2005; Sakai and Wada 2009) demonstrated that the STDP-based topological map model can naturally acquire a certain multiple representation in which the continuous feature is represented topologically and the discontinuous feature is represented by detailed patterns of neuronal activities (Fig. 4). This type of representations give a possible solution to reproduce the properties of neuronal selectivities observed in the inferotemporal (IT) cortex. Population activity in the IT cortex observed by the optical recording exhibits topological correspondence between the position of active cortical area and the rotation of a 3-dimensional object. However, each IT neuron, observed by unit recording, exhibits multiple selectivity to entirely different objects. Moreover, neighboring IT neurons often exhibit different preference patterns. These properties of selectivities of IT neurons are inconsistent with the simple topological map. The multiple representation we demonstrated by STDP-based topological map model is consistent with the properties of IT representations.

#### Main Results III

##### Irrational behavior and synaptic learning rules
Fig.5 Matching and Maximizing strategies in all possible choice behaviors

Various types of behavioral experiments have been conducted to clarify how subjects make a particular decision according to the output of their actions. Typical examples of such decision making tasks are simple alternative tasks in which subjects are typically required to choose one of alternative behavioral responses to get a reward. It is known that subject's choice behavior is known to obey the matching law, which says that the frequency of choosing each alternative is proportional to the amount of the past reward obtained by the choice (Herrnstein 1961), $\frac{N_a}{\sum_a N_a}=\frac{I_a}{\sum_a I_a}$ where Na represents the number of times choosing alternative a, and Ra represents the amount of the past reward obtained by choosing alternative a. For the sake of simplicity in task design, we consider an alternative choice task consisting of a trial sequence with discrete time steps, while free-response tasks on continuous time have been used in many previous studies concerning the matching behavior. It is known that subject's choice behavior in a discrete-time-designed task also obeys the matching law (Sugrue et al. 2003).

Several decision mechanisms were previously proposed to reproduce the matching behavior (Vaughan 1981; Sugrue et al. 2003; Seung 2003). In addition, we proved that the actor-critic learning, one of the standard reinforcement learning algorithms, exhibits the matching behavior in the steady state of learning (Sakai and Fukai 2008a). Interestingly, the actor-critic learning was designed from the view point of engineerings, independently from the matching behavior of animals. We showed that these leaning algorithms which exhibit the matching behavior can be derived from a common principle, called as the "stochastic gradient ascent," in which the choice probabilities are optimized by the gradient rule derived on the assumption that the expectation of a reward for a choice might be independent from the history of the past choices (Markov Decision process). As the assumption, the actual reward expectations for the individual alternatives are independent from the choice history, the stochastic gradient ascent can lead to the optimal solution that maximize the obtained rewards. However, in the case that the actual reward expectations depends on the choice history (non-Markov Decision process), the choice probabilities updated by the stochastic gradient ascent is proved to stay around a solution satisfying the matching law.

The results suggest a possibility that the matching behavior may arise from an optimizing method feasible in Markov environments, and such an optimizing method is implemented in a neural system.

#### Future research directions

Our group attempted to provide simple views of mechanisms relating phenomena observed in different levels, by using computational theory, model-based analyses, and engineering techniques. We provided various links relating phenomena observed in different levels. These links were provided independently, but these are all oriented to a final goal. Our final goal is to understand the mechanism of "information creation" in the brain.

Computational theory of sensory system provides a framework to examine how the neurons acquire effective representations of the external world. In this framework, the neural system can acquire a kind of categorization, a kind of source selection and representations of independent components extracted from sensory inputs. Our group has provided these types of demonstrations (Kitano and Fukai 2004; Sakai 2005). However, such types of categorization and source selection just reflects clustered or biased distribution in sensory inputs. In the framework, we can understand the mechanisms of passive learning of representations reflecting statistical features in sensory inputs. Such computational framework can never provide understanding of "information creation."

In contrast, the reinforcement learning theory provide a framework to examine how the subject makes a decision when the relevant information sources are given. The knowledges accumulated by applications of reinforcement learning for actual problems suggest that selection of relevant sources and coarse division of the input space are significant for the successes of learnings. However, it remains unknown how to find the optimal source selection and input division. Considering animals' and humans' behaviors, a brain must explore the optimal source selection and input division. The source selection and the input division in the decision making mechanism must have certain effects on the feature extraction and the categorization in sensory system. Such processes are considered to be one of the primitive features of "information creation."

We attempt to understand the mechanisms of the "goal-oriented source selection."

We provided a simple view to understand the matching behavior (Sakai and Fukai, in press). If the proposed mechanism is true, then the matching behavior can be extended to incorporate state transitions and state-dependent choices. This framework incorporating the state dependence provides a method to find which input source of available ones the subject uses. By using this method, we attempt to examine how the subject explores the optimal source selection. Simultaneously, we attempt to construct a framework to optimize the set of the source selection, the division in the input space and the choices. We expect that such a framework provides a new view to self-organizations in sensory systems and learning rules in synapses.