This paper presents a novel pipeline for generating human-aligned reward labels in the context of offline reinforcement learning for autonomous emergency braking in occluded pedestrian crossing scenarios.
The proposed pipeline addresses the challenge of absent reward signals in real-world datasets by generating labels that reflect human judgment and safety considerations.
An adaptive safety component is incorporated, allowing the autonomous vehicle to prioritize safety over efficiency in potential collision scenarios by analyzing semantic segmentation maps.
The results demonstrate the effectiveness of the method in producing reliable and human-aligned reward signals, facilitating the training of autonomous driving systems in alignment with human values.