ColloSSL: Collaborative Self-Supervised Learning for Human Activity Recognition
24 January 2022
A major bottleneck in training robust Human-Activity Recognition models (HAR) is the need for large-scale labeled sensor datasets. Because labeling large amounts of sensor data is an expensive task, unsupervised and semi-supervised learning techniques have emerged that can learn good features from the data without requiring any labels. In this paper, we extend this line of research and present a novel technique called Collaborative Self-Supervised Learning (ColloSSL) which leverages unlabeled data collected from emph{multiple} devices worn by a user to learn high-quality features of the data. A key insight that underpins the design of solution{} is that unlabeled sensor datasets simultaneously captured by multiple devices can be viewed as natural transformations of each other, and leveraged to generate a supervisory signal for representation learning. We present three technical innovations to extend conventional self-supervised learning algorithms to a multi-device setting: a emph{Device Selection} approach which selects positive and negative devices to enable contrastive learning, a emph{Contrastive Sampling} algorithm which samples positive and negative examples in a multi-device setting, and a loss function called emph{Multi-view Contrastive Loss} which extends standard contrastive loss to a multi-device setting. Our experimental results on three multi-device datasets show that solution{} outperforms both fully-supervised and semi-supervised learning techniques in majority of the experiment settings, resulting in an absolute increase of upto 7.9% in $F_1$ score compared to the best performing baselines. We also show that solution{} outperforms the fully-supervised methods in a low-data regime, by just using one-tenth of the available labeled data in the best case.