paper
arXiv cs.AI
November 18th, 2025 at 5:00 AM

The Temporal Trap: Entanglement in Pre-Trained Visual Representations for Visuomotor Policy Learning

arXiv:2502.03270v3 Announce Type: replace-cross Abstract: The integration of pre-trained visual representations (PVRs) has significantly advanced visuomotor policy learning. However, effectively leveraging these models remains a challenge. We identify temporal entanglement as a critical, inherent issue when using these time-invariant models in sequential decision-making tasks. This entanglement arises because PVRs, optimised for static image understanding, struggle to represent the temporal dependencies crucial for visuomotor control. In this work, we quantify the impact of temporal entanglement, demonstrating a strong correlation between a policy's success rate and the ability of its latent space to capture task-progression cues. Based on these insights, we propose a simple, yet effective disentanglement baseline designed to mitigate temporal entanglement. Our empirical results show that traditional methods aimed at enriching features with temporal components are insufficient on their own, highlighting the necessity of explicitly addressing temporal disentanglement for robust visuomotor policy learning.

#ai

Score: 2.80

Engagement proxy: 0

Canonical link: https://arxiv.org/abs/2502.03270