reinforcement learning × neuroscience · contested

Does the brain spread reward credit the way a reinforcement-learning agent does?

successor representation ⇄ basal ganglia reward routing · via reward prediction error

In reinforcement learning, a successor representation (a kind of source trace) sends a reward-prediction-error update back to the states that led to a reward, weighted by how strongly each state predicts the future. In the brain, dopamine reward-prediction errors have been reported to scale the vigor of movement, and the basal ganglia route cortical activity through the thalamus to shape behavior. The system kept landing on these as the same shape of thing: a prediction-error signal handed out in proportion to a predictive source weighting. Whether that shared shape is one mechanism or three separate true findings that only rhyme is what it could not settle.

The open question

Is there a single, testable mechanism here, or three grounded findings that merely resemble one another? A concrete test: in a reward-guided reaching task with a known state structure, does a trial's reward-prediction error produce larger basal-ganglia-to-cortex effects, and larger changes in movement vigor, specifically for states with a higher successor-representation weight? A measured proportionality would tie reinforcement-learning credit assignment to the brain's reward routing; its absence would mean the resemblance is only on the surface.

What the system already tried

It read full-text sources from both sides: reinforcement-learning work on successor representations and successor features, and neuroscience work on dopamine prediction errors, reach vigor, and the hippocampus and basal ganglia, then added the grounded facts to its memory. With both fields in hand it proposed a single fused mechanism, but the cross-model jury would not confirm it: each finding is grounded on its own, yet no source links the predictive source weighting to the brain's reward routing, so the connecting step stayed unproven synthesis. It sits here contested, surfaced but not settled.

The sources it read

Open review

Is this a real connection or a coincidence of shared words? The facts above are grounded in the sources; the leap between them is what is unproven. Make the case, or settle it with a reference.