This work presents an evaluation framework for explainable and adversarially robust sleep stage classification using multivariate time series data collected from wrist-worn wearable devices (DREAMT dataset). We examine three models commonly used for sleep stage classification across industry and research: a LightGBM plus biLSTM for post-processing, a support vector machine (SVM), and a combined convolutional neural network & recurrent neural network (CNN-RNN) architecture. In this work, we examine the trade-offs between predictive performance and robustness, using explainability techniques to understand the models’ vulnerabilities to noise. All three models achieved strong baseline performance, with the LightGBM-biLSTM reaching an F1 of 0.87, the SVM 0.81, and the CNN-RNN 0.76. While all models were robust to Gaussian noise, they proved vulnerable to gradient-based attacks. The LightGBM-biLSTM was most susceptible to the Smooth Gradient Method (F1 dropping to 0.63 at epsilon 0.2), the SVM collapsed under FGSM (F1 0.52), and the CNN-RNN demonstrated the greatest overall robustness. SHAP and GradCAM analyses revealed that acceleration signals were consistently the most vulnerable features across models.
Mentor: Brinnae Bent
Project poster (PDF)