Combining crowd-generated media and personal data: semi-supervised learning for context recognition

  • Authors:
  • Long-Van Nguyen-Dinh;Mirco Rossi;Ulf Blanke;Gerhard Tröster

  • Affiliations:
  • ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland

  • Venue:
  • Proceedings of the 1st ACM international workshop on Personal data meets distributed multimedia
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The growing ubiquity of sensors in mobile phones has opened many opportunities for personal daily activity sensing. Most context recognition systems require a cumbersome preparation by collecting and manually annotating training examples. Recently, mining online crowd-generated repositories for free annotated training data has been proposed to build context models. A crowd-generated dataset can capture a large variety both in terms of class number and in intra-class diversity, but may not cover all user-specific contexts. Thus, performance is often significantly worse than that of user-centric training. In this work, we exploit for the first time the combination of both crowd-generated audio dataset available in the web and unlabeled audio data obtained from users' mobile phones. We use a semi-supervised Gaussian mixture model to combine labeled data from the crowd-generated database and unlabeled personal recording data. Hereby we refine generic knowledge with data from the user to train a personalized model. This technique has been tested on 7 users on mobile phones with a total data of 14 days and up to 9 context classes. Preliminary results show that a semi-supervised model can improve the recognition accuracy up to 21%.