Parallel database systems: the future of database processing or a passing fad?
ACM SIGMOD Record - Directions for future database research & development
A critique of ANSI SQL isolation levels
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Concurrency Control in Distributed Database Systems
ACM Computing Surveys (CSUR)
Operating system support for database management
Communications of the ACM
Prototyping Bubba, A Highly Parallel Database System
IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
The Chubby lock service for loosely-coupled distributed systems
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Sinfonia: a new paradigm for building scalable distributed systems
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Stateful bulk processing for incremental analytics
Proceedings of the 1st ACM symposium on Cloud computing
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
ElasTraS: an elastic transactional data store in the cloud
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
DryadInc: reusing work in large-scale computations
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Nova: continuous Pig/Hadoop workflows
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A distributed resource management architecture for interconnecting Web-of-Things using uBox
Proceedings of the Second International Workshop on Web of Things
Full-text indexing for optimizing selection operations in large-scale data analytics
Proceedings of the second international workshop on MapReduce and its applications
Mining large distributed log data in near real time
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Incoop: MapReduce for incremental computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
PrIter: a distributed framework for prioritized iterative computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
Transactional storage for geo-replicated systems
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Incremental recomputations in MapReduce
Proceedings of the third international workshop on Cloud data management
Large-scale continuous subgraph queries on streams
Proceedings of the first annual workshop on High performance computing meets databases
Online workflow management and performance analysis with stampede
Proceedings of the 7th International Conference on Network and Services Management
Kineograph: taking the pulse of a fast-changing and connected world
Proceedings of the 7th ACM european conference on Computer Systems
A critique of snapshot isolation
Proceedings of the 7th ACM european conference on Computer Systems
LazyBase: trading freshness for performance in a scalable database
Proceedings of the 7th ACM european conference on Computer Systems
The datacenter needs an operating system
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Large-scale incremental data processing with change propagation
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
TransMR: data-centric programming beyond data parallelism
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
The evolving landscape of data management in the cloud
International Journal of Computational Science and Engineering
Abstract state machines for data-parallel computing
Conceptual Modelling and Its Theoretical Foundations
Clustering and load balancing optimization for redundant content removal
Proceedings of the 21st international conference companion on World Wide Web
iMapReduce: A Distributed Computing Framework for Iterative Computation
Journal of Grid Computing
Shredder: GPU-accelerated incremental storage and computation
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Scalable Join Queries in Cloud Data Stores
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Stormy: an elastic and highly available streaming service in the cloud
Proceedings of the 2012 Joint EDBT/ICDT Workshops
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Using R for iterative and incremental processing
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Index maintenance for time-travel text search
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Oolong: asynchronous distributed applications made easy
Proceedings of the Asia-Pacific Workshop on Systems
Auto-parallelizing stateful distributed streaming applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Oolong: asynchronous distributed applications made easy
APSys'12 Proceedings of the Third ACM SIGOPS Asia-Pacific conference on Systems
Spanner: Google's globally-distributed database
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Facilitating real-time graph mining
Proceedings of the fourth international workshop on Cloud data management
Themis: an I/O-efficient MapReduce
Proceedings of the Third ACM Symposium on Cloud Computing
Data security services, solutions and standards for outsourcing
Computer Standards & Interfaces
Communications of the ACM
Queue - Web Security
MapReduce-Based data stream processing over large history data
ICSOC'12 Proceedings of the 10th international conference on Service-Oriented Computing
Streaming big data with self-adjusting computation
DDFP '13 Proceedings of the 2013 workshop on Data driven functional programming
ElasTraS: An elastic, scalable, and self-managing transactional database for the cloud
ACM Transactions on Database Systems (TODS)
Cloud Platform Datastore Support
Journal of Grid Computing
Incremental stream processing using computational conflict-free replicated data types
Proceedings of the 3rd International Workshop on Cloud Data and Platforms
Fast data in the era of big data: Twitter's real-time related query suggestion architecture
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Omega: flexible, scalable schedulers for large compute clusters
Proceedings of the 8th ACM European Conference on Computer Systems
Brief announcement: towards a fully-articulated pessimistic distributed transactional memory
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Dynamic memory allocation policies for postings in real-time Twitter search
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
DAX: a widely distributed multitenant storage service for DBMS hosting
Proceedings of the VLDB Endowment
Large-scale computation not at the cost of expressiveness
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Spanner: Google’s Globally Distributed Database
ACM Transactions on Computer Systems (TOCS)
Fast candidate generation for real-time tweet search with bloom filter chains
ACM Transactions on Information Systems (TOIS)
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions
Journal of Grid Computing
Consolidated cluster systems for data centers in the cloud age: a survey and analysis
Frontiers of Computer Science: Selected Publications from Chinese Universities
Simplifying MapReduce data processing
International Journal of Computational Science and Engineering
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Tango: distributed data structures over a shared log
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Discretized streams: fault-tolerant streaming computation at scale
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Naiad: a timely dataflow system
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Exploring storage class memory with key value stores
Proceedings of the 1st Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads
Network-aware data caching and prefetching for cloud-hosted metadata retrieval
NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
CORFU: A distributed shared log
ACM Transactions on Computer Systems (TOCS)
MillWheel: fault-tolerant stream processing at internet scale
Proceedings of the VLDB Endowment
Online, asynchronous schema change in F1
Proceedings of the VLDB Endowment
F1: a distributed SQL database that scales
Proceedings of the VLDB Endowment
Scalable transactions across heterogeneous NoSQL key-value data stores
Proceedings of the VLDB Endowment
Active data: a data-centric approach to data life-cycle management
PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
Toward a scale-out data-management middleware for low-latency enterprise computing
IBM Journal of Research and Development
Hi-index | 0.02 |
Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput requirements of these tasks: Google's indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency. We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.