Airbnb has developed an Embedding-Based Retrieval (EBR) system to improve search accuracy and scalability for finding relevant homes for users.
The system aims to narrow down the initial set of homes before using more compute-intensive models for ranking.
Challenges in building the EBR system included constructing training data, designing the model architecture, and implementing an online serving strategy.
Training data construction involved using contrastive learning to map homes and search queries into numerical vectors.
User trips were grouped to identify positive and negative pairs for training the machine learning model.
The model architecture consisted of a two-tower network design processing features of home listings and search queries separately.
For online serving, an approximate nearest neighbor (ANN) solution like inverted file index (IVF) was chosen for scalability and performance.
Using Euclidean distance in the similarity function improved cluster size uniformity and retrieval performance.
The EBR system led to a significant increase in bookings and displayed more relevant results to users, particularly for queries with many options.
The system was successfully launched in both Search and Email Marketing productions.