<ul data-eligibleForWebStory="true"><li>SensorLM is introduced as a family of sensor-language foundation models for wearable sensor data understanding with natural language.</li><li>The lack of paired, richly annotated sensor-text descriptions in real-world wearable data makes aligning and interpreting sensor data with language challenging.</li><li>SensorLM uses a hierarchical caption generation pipeline to extract statistical, structural, and semantic information from sensor data, creating the largest sensor-language dataset with over 59.7 million hours of data from 103,000 individuals.</li><li>It extends multimodal pretraining architectures like CLIP and CoCa, outperforming state-of-the-art methods in zero-shot recognition, few-shot learning, and cross-modal retrieval in human activity analysis and healthcare tasks.</li><li>SensorLM showcases capabilities such as scaling behaviors, label efficiency, sensor captioning, and zero-shot generalization to new tasks.</li></ul>

SensorLM: Learning the Language of Wearable Sensors

Discover more