A new framework, TREQA, is introduced for evaluating translation quality at the paragraph-level.
TREQA assesses how accurately candidate translations answer reading comprehension questions that target key information in the original source or reference texts.
In challenging domains, TREQA is shown to be competitive with, and sometimes outperforms, state-of-the-art neural and LLM-based metrics in ranking alternative paragraph-level translations.
The generated questions and answers provide interpretability by effectively targeting translation errors identified in evaluated datasets.