Query evaluation: strategies and optimizations
Information Processing and Management: an International Journal
Optimization of inverted vector searches
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Matching events in a content-based subscription system
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Filtering algorithms and implementation for very fast publish/subscribe systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mesh-based content routing using XML
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Modern Information Retrieval
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Forwarding in a content-based network
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
An Efficient Multicast Protocol for Content-Based Publish-Subscribe Systems
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Efficient query evaluation using a two-level retrieval process
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Path sharing and predicate evaluation for high-performance XML filtering
ACM Transactions on Database Systems (TODS)
Optimization strategies for complex queries
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Retroactive answering of search queries
Proceedings of the 15th international conference on World Wide Web
Top-k/w publish/subscribe: finding k most relevant publications in sliding time window w
Proceedings of the second international conference on Distributed event-based systems
Scalable ranked publish/subscribe
Proceedings of the VLDB Endowment
Evaluating top-k queries over incomplete data streams
Proceedings of the 18th ACM conference on Information and knowledge management
Caching search engine results over incremental indices
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The gist of everything new: personalized top-k processing over web 2.0 streams
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Social content, such as Twitter updates, often have the quickest first-hand reports of news events, as well as numerous commentaries that are indicative of public view of such events. As such, social updates provide a good complement to professionally written news articles. In this paper we consider the problem of automatically annotating news stories with social updates (tweets), at a news website serving high volume of pageviews. The high rate of both the pageviews (millions to billions a day) and of the incoming tweets (more than 100 millions a day) make real-time indexing of tweets ineffective, as this requires an index that is both queried and updated extremely frequently. The rate of tweet updates makes caching techniques almost unusable since the cache would become stale very quickly. We propose a novel architecture where each story is treated as a subscription for tweets relevant to the story's content, and new algorithms that efficiently match tweets to stories, proactively maintaining the top-k tweets for each story. Such top-k pub-sub consumes only a small fraction of the resource cost of alternative solutions, and can be applicable to other large scale content-based publish-subscribe problems. We demonstrate the effectiveness of our approach on realworld data: a corpus of news stories from Yahoo! News and a log of Twitter updates.