Probabilistic analysis of a large-scale urban traffic sensor data set

  • Authors:
  • Jon Hutchins;Alexander Ihler;Padhraic Smyth

  • Affiliations:
  • Dept. of Computer Science, University of California, Irvine, CA;Dept. of Computer Science, University of California, Irvine, CA;Dept. of Computer Science, University of California, Irvine, CA

  • Venue:
  • Sensor-KDD'08 Proceedings of the Second international conference on Knowledge Discovery from Sensor Data
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Real-world sensor time series are often significantly noisier and more difficult to work with than the relatively clean data sets that tend to be used as the basis for experiments in many research papers. In this paper we report on a large case-study involving statistical data mining of over 100 million measurements from 1700 freeway traffic sensors over a period of seven months in Southern California. We discuss the challenges posed by the wide variety of different sensor failures and anomalies present in the data. The volume and complexity of the data precludes the use of manual visualization or simple thresholding techniques to identify these anomalies. We describe the application of probabilistic modeling and unsupervised learning techniques to this data set and illustrate how these approaches can successfully detect underlying systematic patterns even in the presence of substantial noise and missing data.