menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Google News

>

Google, By...
source image

Cloudblog

1d

read

123

img
dot

Image Credit: Cloudblog

Google, Bytedance, and Red Hat make Kubernetes generative AI inference aware

  • Google, Bytedance, and Red Hat have partnered to introduce new capabilities for generative AI inference on Kubernetes.
  • The Gateway API Inference Extension now supports LLM-aware routing, making it more cost-effective to operationalize popular techniques for large language models (LLMs) inference.
  • A new inference performance project provides a benchmarking standard for detailed model performance insights on accelerators and HPA scaling metrics and thresholds.
  • Dynamic Resource Allocation simplifies and automates how Kubernetes allocates and schedules GPUs, TPUs, and other devices to pods and workloads, ensuring scheduling efficiency and portability across accelerators.

Read Full Article

like

7 Likes

For uninterrupted reading, download the app