This paper provides a review of Contextual Multi-Armed Bandit (CMAB) methods and introduces an experimental framework for scalable and interpretable offer selection in retail.
The framework models context at the product category level, allowing offers to span multiple categories, enhancing learning efficiency in dynamic environments.
It extends CMAB methodology to support multi-category contexts and achieves scalability through efficient feature engineering and modular design.
The prototype offers interpretability at scale through logistic regression models and a large language model interface for real-time tracking and explanation of evolving preferences.