Bistro data feed management system

  • Authors:
  • Vladislav Shkapenyuk;Theodore Johnson;Divesh Srivastava

  • Affiliations:
  • AT&T Labs - Research, Florham Park, NJ, USA;AT&T Labs - Research, Florham Park, NJ, USA;AT&T Labs - Research, Florham Park, NJ, USA

  • Venue:
  • Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data feed management is a critical component of many data intensive applications that depend on reliable data delivery to support real-time data collection, correlation and analysis. Data is typically collected from a wide variety of sources and organizations, using a range of mechanisms - some data are streamed in real time, while other data are obtained at regular intervals or collected in an ad hoc fashion. Individual applications are forced to make separate arrangements with feed providers, learn the structure of incoming files, monitor data quality, and trigger any processing necessary. The Bistro data feed manager, designed and implemented at AT&T Labs- Research, simplifies and automates this complex task of data feed management: efficiently handling incoming raw files, identifying data feeds and distributing them to remote subscribers. Bistro supports a flexible specification language to define logical data feeds using the naming structure of physical data files, and to identify feed subscribers. Based on the specification, Bistro matches data files to feeds, performs file normalization and compression, efficiently delivers files, and notifies subscribers using a trigger mechanism. We describe our feed analyzer that discovers the naming structure of incoming data files to detect new feeds, dropped feeds, feed changes, or lost data in an existing feed. Bistro is currently deployed within AT&T Labs and is responsible for the real-time delivery of over 100 different raw feeds, distributing data to several large-scale stream warehouses.