<ul><li>The inference-time resource costs of large language and vision models pose challenges in production deployments.</li><li>A solution proposed is using foundation model programs, which can invoke foundation models with varying resource costs and performance.</li><li>A method is presented that translates a task into a program and learns a policy for resource allocation, selecting foundation model 'backends' for each program module.</li><li>Compared to monolithic multi-modal models, the implementation achieves up to 98% resource savings with minimal accuracy loss.</li></ul>

Resource-efficient Inference with Foundation Model Programs

Discover more