Deployed an LLM inference solution using NVIDIA GPUs on Amazon EKS while attending an AWS hands-on workshop.
Utilized Ray Serve and vLLM for deploying the Mistral 7B Instruct v0.3 model on Amazon EKS.
Deployed components like Ray, Ray Serve, and vLLM for building and managing generative AI applications on Amazon EKS.
The deployment included using the kuberay operator for handling Ray complexity, utilizing Ray dashboard for cluster visibility, and installing NVIDIA DCGM exporter for GPU monitoring.