Learning efficient representations for decision-making policies is a challenge in imitation learning (IL).Self-supervised learning (SSL) offers an alternative by allowing models to learn from diverse, unlabeled data, including failures.ACT-JEPA is a novel architecture that integrates IL and SSL to enhance policy representations.ACT-JEPA improves the quality of representations by learning temporal environment dynamics and effectively generalizes to action sequence prediction.