Habby, a game studio, faced challenges with connection stability in their chat functionality using Amazon ElastiCache for Redis OSS publish/subscribe (Pub/Sub) during infrastructure changes.
To address this, they adopted Valkey GLIDE, a client library for Amazon ElastiCache, which improved system reliability, handling 500,000 concurrent players and 100,000 queries per second during failover testing.
Habby's messaging system architecture includes WebSocket servers, IM services, REST API servers, and Amazon ElastiCache cluster for message delivery and interaction management.
The system uses Valkey GLIDE for client communication with Amazon ElastiCache, improving message delivery through unicast, broadcast, and multicast distribution types.
Valkey GLIDE provides features like robust failover system, direct primary node subscription, customizable retry configuration, and independent subscription client for reliability and scalability.
The article details Player class implementation, connection management, message sending and receiving, with emphasis on system structuring and configuration.
Valkey GLIDE migration resulted in efficient failover handling, supporting 500,000 concurrent players and 100,000 QPS load with 500 node capacity and sharded Pub/Sub architecture of Amazon ElastiCache.
The implementation completed in two weeks, enhancing system reliability and performance, laying a foundation for scalability and high performance for Habby's message delivery system.
The authors include Shuxiang Zhao, Haoyang Yu from Habby, Lili Ma, Xin Zhang, and Siva Karuturi from AWS, specializing in game backend, software development, database solutions, and solutions architecture.
Overall, the Valkey GLIDE and Amazon ElastiCache integration significantly improved system reliability and performance for Habby's game studio, offering a fault-tolerant architecture for message delivery.