Generating code from natural language specifications using Large Language Models (LLMs) can lead to factually incorrect outputs known as hallucinations.
A new measure of incorrectness called incoherence has been introduced to estimate error likelihood in LLM-generated code without requiring an existing correct implementation (oracle).
Experiments show that the incoherence method can identify around two-thirds of incorrect programs with no false positives, providing a reliable alternative to oracle-based evaluations.
The study indicates a strong correlation between LLM rankings based on correctness evaluations using an oracle and the incoherence measure, suggesting the effectiveness of the new approach.