Google Cloud Run aims to ease the pressure for companies to pay off their investment in artificial intelligence training with inference by offering on-demand access to GPUs.
Cloud Run combines serverless services with containers on Google Cloud's scalable infrastructure, making it cost-efficient for compute power needs.
Yunong Xiao and Steren Giannini from Google Cloud discuss how Cloud Run provides serverless access to GPUs, offering a solution to the capacity crunch and high costs associated with inference.
Google Cloud is debunking the myth that serverless doesn't scale with Cloud Run, which sits on a highly scalable Borg infrastructure, making containers portable across platforms.
Cloud Run's portable API, akin to Kubernetes API, supports agentic AI needs and allows for autoscaling on demand.
Companies like L'Oreal S.A. have utilized Cloud Run for their chatbot, benefiting from on-demand usage during peak times and cost savings during off-peak hours.
Google Cloud plans to enhance Cloud Run's speed, efficiency, and sustainability, focusing on reducing latency and offering larger GPU types.
The platform's goal is to be more environmentally friendly through its on-demand model and reducing carbon footprint with initiatives like Google Cloud Carbon Footprint.
Google Cloud Run's next steps include improving startup time, offering bigger GPU types, and optimizing performance for faster processing.
Customers can expect more updates on Cloud Run at Google Cloud Next in April, with ongoing efforts to enhance platform capabilities and accessibility for users.