Wire Riot

paper

arXiv cs.LG

November 18th, 2025 at 5:00 AM

Reward Redistribution via Gaussian Process Likelihood Estimation

arXiv:2503.17409v2 Announce Type: replace Abstract: In many practical reinforcement learning tasks, feedback is only provided at the end of a long horizon, leading to sparse and delayed rewards. Existing reward redistribution methods typically assume that per-step rewards are independent, thus overlooking interdependencies among state-action pairs. In this paper, we propose a Gaussian process based Likelihood Reward Redistribution (GP-LRR) framework that addresses this issue by modeling the reward function as a sample from a Gaussian process, which explicitly captures dependencies between state-action pairs through the kernel function. By maximizing the likelihood of the observed episodic return via a leave-one-out strategy that leverages the entire trajectory, our framework inherently introduces uncertainty regularization. Moreover, we show that conventional mean-squared-error (MSE) based reward redistribution arises as a special case of our GP-LRR framework when using a degenerate kernel without observation noise. When integrated with an off-policy algorithm such as Soft Actor-Critic, GP-LRR yields dense and informative reward signals, resulting in superior sample efficiency and policy performance on several MuJoCo benchmarks.

#ai

#research

Open source

Score: 2.80

Engagement proxy: 0

Canonical link: https://arxiv.org/abs/2503.17409