Large Language Models (LLMs) face memory inefficiencies during long-context inference.A new integration of PagedAttention with PyTorch's FlexAttention is introduced to improve efficiency.The fusion of attention kernel in IBM's Foundation Model Stack (FMS) reduces inference latency significantly.Benchmarks on an NVIDIA L4 GPU show reduced latency with global KV cache, maintaining linear growth with sequence length.