menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Devops News

>

Introducing JobSet
source image

Kubernetes

2w

read

112

img
dot

Introducing JobSet

  • JobSet is an open source API introduced by Daniel Vega-Myhre, Abdullah Gharaibeh, and Kevin Hannon for representing distributed jobs, focusing on distributed ML training and HPC workloads on Kubernetes.
  • JobSet aims to solve gaps in Kubernetes primitives for distributed ML training, offering a unified API for large-scale distributed HPC and ML use cases.
  • Key features of JobSet include Replicated Jobs, automatic headless service management, configurable success and failure policies, and exclusive placement per topology domain.
  • It models distributed batch workloads as Kubernetes Jobs and uses ReplicatedJobs to manage child Jobs, allowing users to define different pod templates for various groups of pods.
  • JobSet integrates with Kueue for workload queuing, oversubscription, multi-tenancy, and more, enhancing cluster resource utilization.
  • An example use case of JobSet involves distributed ML training on multiple TPU slices using Jax, showcasing its capabilities for TPU-based workloads.
  • Future JobSet features include configurable success and failure policies, seamless integration with Kubernetes, and providing a rich API for distributed computing tasks.
  • Developers and contributors are encouraged to engage with the JobSet project, offer feedback, report bugs, suggest features, and participate in its development.
  • For more information on JobSet, its roadmap, and how to get involved, interested parties can visit the project repository, mailing list, or reach out on Slack.

Read Full Article

like

6 Likes

For uninterrupted reading, download the app