Study sheds light on how the brain learns to seek reward

Rewards have a profound impact on shaping our behaviour as a new study reveals. Just like training a dog to play fetch, our brains are constantly working to understand which actions lead to positive outcomes. This process, known as the “credit assignment problem,” has long puzzled scientists.

Dopamine, a chemical messenger in the brain, plays a vital role in this learning process. However, the exact mechanism through which specific actions are connected to dopamine release has remained elusive. Until now.

A groundbreaking study published in Nature by researchers from the Allen Institute, Columbia University, the Champalimaud Centre for the Unknown, and Seattle Children’s Research Institute has shed new light on this mystery. Not only does dopamine signal a reward, but it also guides animals to pinpoint the behaviors that lead to these rewards through trial and error.

One of the most fascinating findings of the study is that the brain’s reward system can dynamically alter an animal’s entire range of movements and behaviors. This means that behaviors are not only reinforced but actively shaped and refined through experience.

The research team collaborated with engineers and neuroscientists to develop a unique “closed loop” system, allowing them to link specific actions by mice to real-time dopamine release. By outfitting the mice with wireless sensors and using machine learning algorithms, the researchers were able to categorise their actions and stimulate dopamine neurons when the mice performed predefined “target actions.”

They discovered that mice rapidly changed their behavior in response to dopamine release. Not only did they increase the frequency of the target action, but they also enhanced similar actions and those that occurred shortly before dopamine release. Conversely, actions dissimilar to the target rapidly decreased. Over time, the mice became more precise, focusing solely on the exact action that led to dopamine release.

The study also explored how mice learn a series of actions, uncovering a fascinating process akin to rewinding time. When actions triggering dopamine occurred with longer intervals, the mice learned more slowly. This suggests that shorter waits between actions make it easier for mice to connect the sequence with the reward. By “rewinding,” the mice strengthen their behavior and progressively identify the precise actions and sequences that yield the reward.

These findings have broader implications beyond understanding the brain’s reward system. They could impact fields such as education and artificial intelligence (AI). Applying these insights to classrooms could involve allowing for exploration, mistakes, and gradual refinement, aligning with our brain’s natural learning processes. In the realm of AI, replicating biological learning processes could lead to more sophisticated and efficient learning systems that adapt to new data and situations.

Lead author Jonathan Tang emphasises the significance of delving into these complexities, stating, “We take a lot of stuff for granted about how things work, including credit assignment. But it’s when you really start diving in that you realize the complexity. This is why people do science: to home in on the truth of the matter.”