<ul><li>Large Language Models (LLMs) are used for behavior planning based on natural language instructions, but struggle with ambiguous instructions in real-world scenarios.</li><li>Various methods have been proposed for detecting task ambiguity, but lack of universal benchmark makes comparison difficult.</li><li>To address this, AmbiK dataset, focusing on ambiguous tasks in kitchen environment, has been introduced.</li><li>The dataset includes 1000 pairs of ambiguous tasks and their unambiguous versions, categorized by ambiguity type, created with the help of LLMs and human validation.</li></ul>

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

Discover more