Airbnb's User Signals Platform (USP) addresses challenges in providing meaningful personalization across its wide range of services by ingesting and processing user actions in near real-time with low latency.
USP supports both synchronous and asynchronous computation, stores real-time and historical user engagement data in a queryable and durable manner, and allows product teams to define signals and behaviors without coding.
The architecture of USP involves a Lambda-style pipeline for data processing and an online serving layer for fast data retrieval, making it capable of handling over 1 million events per second.
Key engineering decisions include choosing Flink over Spark for lower latency, maintaining an append-only data model for operational simplicity, and providing a config-driven developer workflow for signal logic definition.
User Signals in USP include Simple User Signals, which map Kafka events to signals, and Join Signals, enabling real-time stateful joins between two Kafka sources for richer context.
User Segments in USP allow dynamic cohort creation based on live user actions, while Session Engagements focus on short, meaningful slices of time to capture user intent and behavior.
Flink Stability with Hot Standby ensures operational resilience, minimizing downtime and event backlog risks in case of failures.
Airbnb's USP powers personalization across the platform, with 100+ Flink jobs in production, 70K queries per second, and plans to enhance pipeline-level orchestration.
The success of USP lies not just in its streaming tech but also in embracing design choices that prioritize resilience, developer-friendly abstractions, and operational stability.
The platform is seen as a core infrastructure at Airbnb, facilitating personalized experiences without requiring every team member to be a stream engineer.