<ul><li>Meta has launched new products using generative AI (GenAI) and optimized their infrastructure accordingly for better performance.</li><li>By splitting GenAI inference traffic into a dedicated WWW tenant, they achieved a 30% latency improvement.</li><li>The Web Foundation team at Meta ensures the monolithic web tier infrastructure is efficient.</li><li>They limit request runtimes to 30 seconds to balance resources and prevent unavailability due to long-running requests.</li><li>Traditional webservers at Meta are optimized for front-end requests with low latencies.</li><li>GenAI products like LLMs require longer processing times and have different infrastructure needs.</li><li>Web Foundation optimized runtime limits, thread-pool sizing, JIT cache, request warm-up, and shadow traffic for GenAI.</li><li>Increasing runtime limits for GenAI requests and customizing configurations improved efficiency.</li><li>Optimizations like thread-pool sizing and JIT caching enhanced performance for GenAI workloads.</li><li>Meta's focus on real-time configuration and infrastructure adjustments showcases their commitment to optimizing GenAI capabilities.</li></ul>

Meta’s Full-stack HHVM optimizations for GenAI

Discover more