Large language models (LLMs) and code generation models have shown human-level performance in tasks like mathematical reasoning and software development.
Hallucination in LLMs raises concerns regarding their applicability to systems with high safety standards, drawing attention in the AI community.
The hallucination of code generation models is a less explored issue, with the challenge lying in verifying the functionality of generated code due to its non-natural form.
To address this, a selective code generator is proposed, which automatically generates unit tests using dynamic code analysis tools to ensure functional correctness of generated code, controlling the rate of code hallucination for trustworthiness.