High-level constructs in the READY event notification system
Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment
Journal of the ACM (JACM)
Achieving scalability and expressiveness in an Internet-scale event notification service
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Efficient Matching for Web-Based Publish/Subscribe Systems
CooplS '02 Proceedings of the 7th International Conference on Cooperative Information Systems
Exploiting Punctuation Semantics in Continuous Data Streams
IEEE Transactions on Knowledge and Data Engineering
InfoBus Repeater: A Secure and Distributed Publish/Subscribe Middleware
ICPP '99 Proceedings of the 1999 International Workshops on Parallel Processing
An Efficient Multicast Protocol for Content-Based Publish-Subscribe Systems
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
PADS: a domain-specific language for processing ad hoc data
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A heartbeat mechanism and its application in gigascope
VLDB '05 Proceedings of the 31st international conference on Very large data bases
LearnPADS: automatic tool generation from ad hoc data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Content distribution for publish/subscribe services
Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Scheduling Updates in a Real-Time Stream Warehouse
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Stream warehousing with DataDepot
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Incremental learning of system log formats
ACM SIGOPS Operating Systems Review
Continuous analytics over discontinuous streams
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ASTERIX: scalable warehouse-style web data integration
Proceedings of the Ninth International Workshop on Information Integration on the Web
Hi-index | 0.00 |
Data feed management is a critical component of many data intensive applications that depend on reliable data delivery to support real-time data collection, correlation and analysis. Data is typically collected from a wide variety of sources and organizations, using a range of mechanisms - some data are streamed in real time, while other data are obtained at regular intervals or collected in an ad hoc fashion. Individual applications are forced to make separate arrangements with feed providers, learn the structure of incoming files, monitor data quality, and trigger any processing necessary. The Bistro data feed manager, designed and implemented at AT&T Labs- Research, simplifies and automates this complex task of data feed management: efficiently handling incoming raw files, identifying data feeds and distributing them to remote subscribers. Bistro supports a flexible specification language to define logical data feeds using the naming structure of physical data files, and to identify feed subscribers. Based on the specification, Bistro matches data files to feeds, performs file normalization and compression, efficiently delivers files, and notifies subscribers using a trigger mechanism. We describe our feed analyzer that discovers the naming structure of incoming data files to detect new feeds, dropped feeds, feed changes, or lost data in an existing feed. Bistro is currently deployed within AT&T Labs and is responsible for the real-time delivery of over 100 different raw feeds, distributing data to several large-scale stream warehouses.