OpenAI introduces SWE-Lancer, a benchmark for evaluating model performance on real-world freelance software engineering work.SWE-Lancer is based on over 1,400 freelance tasks with a total payout of $1 million USD.The benchmark includes end-to-end tests to evaluate both individual code patches and managerial decisions.Results from SWE-Lancer indicate the current capabilities of language models in software engineering and the potential for improvement.