menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

CaLMQA: Ex...
source image

Arxiv

3d

read

378

img
dot

Image Credit: Arxiv

CaLMQA: Exploring culturally specific long-form question answering across 23 languages

  • Researchers introduce CaLMQA, a dataset of 51.7K culturally specific questions across 23 languages.
  • Culturally specific questions are defined as those referring to unique cultural concepts or context-dependent answers.
  • Questions were collected from web forums and native speakers in both high and under-resourced languages.
  • Data collection for CaLMQA was translation-free to include culturally unique questions.
  • Evaluation of LLM-generated answers showed critical surface-level errors for many languages.
  • Even the best models struggled with low-resource languages, making mistakes such as answering in the wrong language or repetitions.
  • Answers to culturally specific questions had more factual errors compared to culturally agnostic questions.
  • CaLMQA aims to support future research in cultural and multilingual long-form question answering.
  • The dataset enables exploration of culturally specific long-form question answering.
  • Cultural uniqueness in questions included examples like 'Why was the first king of Burundi called Ntare (Lion)?' in Kirundi.
  • CaLMQA addresses the lack of exploration of culturally specific questions in LLMs.
  • The study highlights challenges in generating accurate long-form answers across diverse languages and cultures.
  • Surface-level errors were prominent in LLM-generated answers for culturally specific questions.
  • Factual errors were more common in answers to culturally specific questions compared to culturally agnostic questions.
  • CaLMQA dataset creation involved input from multiple languages, including under-resourced ones like Fijian and Kirundi.

Read Full Article

like

22 Likes

For uninterrupted reading, download the app