<ul><li>CS-Sum is introduced to evaluate the comprehensibility of code-switching by Large Language Models (LLMs) through dialogue summarization in multiple language pairs.</li><li>CS-Sum is the first benchmark for code-switching dialogue summarization across Mandarin-English, Tamil-English, and Malay-English, with human-annotated dialogues per language pair.</li><li>Evaluation of ten LLMs reveals that although the scores on automated metrics are high, LLMs make subtle mistakes that can change the meaning of the dialogue.</li><li>The study identifies common errors made by LLMs when processing code-switched input and highlights the varying error rates across language pairs, emphasizing the need for specialized training on code-switched data.</li></ul>

CS-Sum: A Benchmark for Code-Switching Dialogue Summarization and the Limits of Large Language Models

Discover more