<ul><li>Large Language Models (LLMs) have shown impressive capabilities, but ensuring their outputs adhere to strict structural or grammatical constraints remains a challenge.</li><li>Constrained decoding with context-free grammar provides a method to ensure LLMs produce outputs in a specific format by dynamically creating a token logits mask.</li><li>A novel dynamic pruning strategy called ZapFormat, based on the Earley algorithm, has been proposed to eliminate invalid or redundant Earley states in real-time, reducing memory usage and improving speed.</li><li>Experiments show that the new constrained decoding engine Formatron, incorporating ZapFormat, maintains high-precision compliant outputs and achieves significant speed improvements compared to existing implementations.</li></ul>

Earley-Driven Dynamic Pruning for Efficient Structured Decoding

Discover more