SETransformer is a hybrid deep neural architecture designed for Human Activity Recognition (HAR) using wearable sensor data.
It combines Transformer-based temporal modeling with channel-wise squeeze-and-excitation (SE) attention and a learnable temporal attention pooling mechanism.
SETransformer outperforms traditional models such as LSTM, GRU, BiLSTM, and CNN baselines on the WISDM dataset, achieving a validation accuracy of 84.68% and a macro F1-score of 84.64%.
The model shows promising potential for deployment in mobile and ubiquitous sensing applications, offering a competitive and interpretable solution for real-world HAR tasks.