AWS AI Labs has introduced SWE-PolyBench, a multilingual, repository-level benchmark for evaluating AI coding agents.SWE-PolyBench consists of 2,110 tasks across four programming languages - Java, JavaScript, TypeScript, and Python.The benchmark incorporates real pull requests (PRs) and introduces Concrete Syntax Tree (CST)-based metrics for assessment.The evaluation of agents on SWE-PolyBench demonstrates varying performance across languages and task types.