Cloud Run now offers fully managed NVIDIA GPUs, which removes the complexity of driver installations and library configurations.
The guide helps to deploy the Meta Llama 3.2 1B Instruction model on Cloud Run using best practices to streamline your development process.
Cloud Run with GPU offers four critical benefits to developers such as fully managed, on-demand scaling, cost-effective, and NVIDIA GPU-optimized for Meta Llama 3.2.
For local debugging, developers can use TGI Docker image to test and iterate the model before deploying it to Cloud Run.
Deploy the model to Cloud Run with NVIDIA L4 GPU using the deployment script command.
To use Cloud Storage FUSE and reduce the cold start response time, download the model files and upload them to the Cloud Storage bucket and mount the Google Cloud Storage bucket as a file system.
The article also includes instructions to test your deployed model using curl.
The guide provides next steps to learn more about Cloud Run with NVIDIA GPUs and to deploy your own open-source model from Hugging Face.