Documenting code is a necessity, and AI LLM models simplify the task of commenting while coding.
GenAIScript is an open-source scripting environment that streamlines tasks for AI analysis by automating complex processes within a cohesive scripting framework. It facilitates ingestion of diverse document formats and supports the generation of structured outputs.
A batch processing approach is implemented using an AI-powered Command-Line Interface (CLI) that runs on the code-base and uses primarily local LLM models.
The author used the RAG approach with Abstract Syntax Tree (AST) as a way to reduce context on which LLM has to reason, provide hierarchical views of code's syntax and abstracting away irrelevant details, to divide a source code file into meaningful pieces.
Tree-Sitter, an AST tool with an incremental parsing library that can build a concrete syntax tree for a source file, was used by the author to implement the Commenter CLI that generates high-quality comments for existing code within a codebase.
The Commenter CLI is composed of the following steps – loading each source file, submitting its content to the AST parser, which extracts code chunks, invoking LLM model, which reviews implementation and produces a comment based on the chunk, Comment + Chunk is provided back to the AST parser to verify that LLM has produced is valid code for given language, and finally joining all chunks and related comments.
The Commenter CLI method saves time and improves the overall quality of documentation, making the codebase easier for developers to understand and work with.
The qwen2.5-coder:7b model worked well for code commenting tasks, which the author used for the first release of the tool available on the Github project.
Overall, the Commenter tool makes it easier for developers to improve the readability and understandability of their code.
Article originally published on https://bsorrentino.github.io on December 20, 2024.