<ul data-eligibleForWebStory="true"><li>Generating code from natural language specifications using Large Language Models (LLMs) can lead to factually incorrect outputs known as hallucinations.</li><li>A new measure of incorrectness called incoherence has been introduced to estimate error likelihood in LLM-generated code without requiring an existing correct implementation (oracle).</li><li>Experiments show that the incoherence method can identify around two-thirds of incorrect programs with no false positives, providing a reliable alternative to oracle-based evaluations.</li><li>The study indicates a strong correlation between LLM rankings based on correctness evaluations using an oracle and the incoherence measure, suggesting the effectiveness of the new approach.</li></ul>

Estimating Correctness Without Oracles in LLM-Based Code Generation

Discover more