Niyama is a QoS-driven inference serving system that enables efficient co-scheduling of diverse workloads on shared infrastructure.Existing LLM serving frameworks rely on siloed infrastructure, resulting in operational inefficiencies and over-provisioning.Niyama introduces fine-grained QoS classification and a dynamic chunking mechanism to improve serving capacity by 32% compared to current deployments.Under extreme load, Niyama reduces SLO violations by an order of magnitude compared to current strategies.