Meta has launched new products using generative AI (GenAI) and optimized their infrastructure accordingly for better performance.By splitting GenAI inference traffic into a dedicated WWW tenant, they achieved a 30% latency improvement.The Web Foundation team at Meta ensures the monolithic web tier infrastructure is efficient.They limit request runtimes to 30 seconds to balance resources and prevent unavailability due to long-running requests.Traditional webservers at Meta are optimized for front-end requests with low latencies.GenAI products like LLMs require longer processing times and have different infrastructure needs.Web Foundation optimized runtime limits, thread-pool sizing, JIT cache, request warm-up, and shadow traffic for GenAI.Increasing runtime limits for GenAI requests and customizing configurations improved efficiency.Optimizations like thread-pool sizing and JIT caching enhanced performance for GenAI workloads.Meta's focus on real-time configuration and infrastructure adjustments showcases their commitment to optimizing GenAI capabilities.