Feeding frenzy: selectively materializing users' event feeds

Authors:
Adam Silberstein;Jeff Terrace;Brian F. Cooper;Raghu Ramakrishnan
Affiliations:
Yahoo! Research, Santa Clara, CA, USA;Princeton University, Princeton, NJ, USA;Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA
Venue:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Year:
2010

Citing 25
Cited 27

The case for partial indexes

ACM SIGMOD Record
Knapsack problems: algorithms and computer implementations

Knapsack problems: algorithms and computer implementations
Algorithms for creating indexes for very large tables without quiescing updates

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
View maintenance in a warehousing environment

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Efficient view maintenance at data warehouses

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
How to roll a join: asynchronous incremental view maintenance

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Generalized Partial Indexes

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Selection of Views to Materialize in a Data Warehouse

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Automated Selection of Materialized Views and Indexes in SQL Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Deriving Production Rules for Incremental View Maintenance

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive filters for continuous queries over distributed data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive Caching for Continuous Queries

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
TinyDB: an acquisitional query processing system for sensor networks

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
B-tree indexes for high update rates

ACM SIGMOD Record
Materialized views in probabilistic databases: for information exchange and query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Lazy maintenance of materialized views

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Data management projects at Google

ACM SIGMOD Record
Just-in-time query retrieval over partially indexed data on structured P2P overlays

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
End-to-end support for joins in large-scale publish/subscribe systems

Proceedings of the VLDB Endowment
Scalable query result caching for web applications

Proceedings of the VLDB Endowment
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
Asynchronous view maintenance for VLSD databases

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Scalable delivery of stream query result

Proceedings of the VLDB Endowment

Towards location-based social networking services

Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks
Personalized social recommendations: accurate or private

Proceedings of the VLDB Endowment
TI: an efficient indexing mechanism for real-time search on tweets

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Best-effort refresh strategies for content-based RSS feed aggregation

WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Feed following: the big data challenge in social applications

Databases and Social Networks
Characterizing web syndication behavior and content

WISE'11 Proceedings of the 12th international conference on Web information system engineering
Workload-aware indexing for keyword search in social networks

Proceedings of the 20th ACM international conference on Information and knowledge management
Shepherding social feed generation with Sheep

Proceedings of the Fifth Workshop on Social Network Systems
Social piggybacking: leveraging common friends to generate event streams

Proceedings of the Fifth Workshop on Social Network Systems
Managing large dynamic graphs efficiently

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Sindbad: a location-based social networking system

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Poor man's social network: consistently trade freshness for scalability

WebApps'12 Proceedings of the 3rd USENIX conference on Web Application Development
Efficient filtering in micro-blogging systems: we won't get flooded again

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
The anatomy of Sindbad: a location-aware social networking system

Proceedings of the 5th ACM SIGSPATIAL International Workshop on Location-Based Social Networks
Scheduling with freshness and performance guarantees for web applications in the cloud

ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115
Augustus: scalable and robust storage for cloud applications

Proceedings of the 8th ACM European Conference on Computer Systems
On benchmarking online social media analytical queries

First International Workshop on Graph Data Management Experiences and Systems
Adaptive input admission and management for parallel stream processing

Proceedings of the 7th ACM international conference on Distributed event-based systems
Archiving the relaxed consistency web

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Cache refreshing for online social news feeds

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Transaction chains: achieving serializability with low latency in geo-distributed storage systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
GeoRank: an efficient location-aware news feed ranking system

Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Mobility and social networking: a data management perspective

Proceedings of the VLDB Endowment
Supporting distributed feed-following apps over edge devices

Proceedings of the VLDB Endowment
Piggybacking on social networks

Proceedings of the VLDB Endowment
Easy freshness with Pequod cache joins

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Near real-time event streams are becoming a key feature of many popular web applications. Many web sites allow users to create a personalized feed by selecting one or more event streams they wish to follow. Examples include Twitter and Facebook, which a user to follow other users' activity, and iGoogle and My Yahoo, which allow users to follow selected RSS streams. How can we efficiently construct a web page showing the latest events from a user's feed? Constructing such a feed must be fast so the page loads quickly, yet reflects recent updates to the underlying event streams. The wide fanout of popular streams (those with many followers) and high skew (fanout and update rates vary widely) make it difficult to scale such applications. We associate feeds with consumers and event streams with producers. We demonstrate that the best performance results from selectively materializing each consumer's feed: events from high-rate producers are retrieved at query time, while events from lower-rate producers are materialized in advance. A formal analysis of the problem shows the surprising result that we can minimize global cost by making local decisions about each producer/consumer pair, based on the ratio between a given producer's update rate (how often an event is added to the stream) and a given consumer's view rate (how often the feed is viewed). Our experimental results, using Yahoo!'s web-scale database PNUTS, shows that this hybrid strategy results in the lowest system load (and hence improves scalability) under a variety of workloads.