Fast allocation and deallocation of memory based on object lifetimes
Software—Practice & Experience
Programming pearls: little languages
Communications of the ACM
Hancock: a language for extracting signatures from data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Python; Essential Reference
Programming in PROLOG
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Distributed computing in practice: the Condor experience: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Enhancing server availability and security through failure-oblivious computing
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Queue - Performance
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Agreeing to disagree: search engines and their public interfaces
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Google's MapReduce programming model — Revisited
Science of Computer Programming
Status report: the manticore project
ML '07 Proceedings of the 2007 workshop on Workshop on ML
Confessions of a used programming language salesman
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Google's MapReduce programming model – Revisited
Science of Computer Programming
RadixZip: linear time compression of token streams
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
On distributing symmetric streaming computations
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Bigtable: A Distributed Storage System for Structured Data
ACM Transactions on Computer Systems (TOCS)
Data management projects at Google
ACM SIGMOD Record
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
San Fermín: aggregating large data sets using a binomial swap forest
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Declarative processing for computer games
Sandbox '08 Proceedings of the 2008 ACM SIGGRAPH symposium on Video games
Answering what-if deployment and configuration questions with wise
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Implicitly-threaded parallelism in Manticore
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Toward loosely coupled programming on petascale systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Large-scale collaborative analysis and extraction of web data
Proceedings of the VLDB Endowment
Finding frequent items in data streams
Proceedings of the VLDB Endowment
GRIMS: a scalable management and storage system for massive remote sensing images
Proceedings of the 3rd international conference on Scalable information systems
Traverse: Simplified Indexing on Large Map-Reduce-Merge Clusters
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
DRAM errors in the wild: a large-scale field study
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
MapReduce optimization using regulated dynamic prioritization
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
BotGraph: large scale spamming botnet detection
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Finding the frequent items in streams of data
Communications of the ACM - A View of Parallel Computing
Brief announcement: PUSH, a DISC shell
Proceedings of the 28th ACM symposium on Principles of distributed computing
Towards Efficient MapReduce Using MPI
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
SETQA-NLP '09 Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Distributed aggregation for data-parallel computing: interfaces and implementations
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Composing and executing parallel data-flow graphs with shell pipes
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Exploring many task computing in scientific workflows
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Nephele: efficient parallel data processing in the cloud
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Methods for finding frequent items in data streams
The VLDB Journal — The International Journal on Very Large Data Bases
DEDUCE: at the intersection of MapReduce and stream processing
Proceedings of the 13th International Conference on Extending Database Technology
ACM SIGOPS Operating Systems Review
Communications of the ACM
Measuring the user experience on a large scale: user-centered metrics for web applications
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
HadoopToSQL: a mapReduce query optimizer
Proceedings of the 5th European conference on Computer systems
Harnessing input redundancy in a MapReduce framework
Proceedings of the 2010 ACM Symposium on Applied Computing
Towards scalable architectures for clickstream data warehousing
DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
Beyond online aggregation: parallel and incremental data mining with online Map-Reduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Towards scalable RDF graph analytics on MapReduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Comet: batched stream processing for data intensive distributed computing
Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Online aggregation and continuous query support in MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Predictable time-sharing for DryadLINQ cluster
Proceedings of the 7th international conference on Autonomic computing
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
APHID: An architecture for private, high-performance integrated data mining
Future Generation Computer Systems
On distributing symmetric streaming computations
ACM Transactions on Algorithms (TALG)
Toward a cost-effective cloud storage service
ICACT'10 Proceedings of the 12th international conference on Advanced communication technology
User browsing models: relevance versus examination
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Middleware support for many-task computing
Cluster Computing
A Map-Reduce System with an Alternate API for Multi-core Environments
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
Weaver: integrating distributed computing abstractions into scientific workflows using Python
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A common substrate for cluster computing
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
DryadInc: reusing work in large-scale computations
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Scripting the cloud with skywriting
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
ESQP: an efficient SQL query processing for cloud data management
CloudDB '10 Proceedings of the second international workshop on Cloud data management
Scalable clustering algorithm for N-body simulations in a shared-nothing cluster
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Multidimensional arrays for warehousing data on clouds
Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
Evaluating IPv6 adoption in the internet
PAM'10 Proceedings of the 11th international conference on Passive and active measurement
A middleware for parallel processing of large graphs
Proceedings of the 8th International Workshop on Middleware for Grids, Clouds and e-Science
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Behavioral simulations in MapReduce
Proceedings of the VLDB Endowment
HADI: Mining Radii of Large Graphs
ACM Transactions on Knowledge Discovery from Data (TKDD)
Knuckles: bringing the database to the data
International Journal of Computational Science and Engineering
Integrating MapReduce and RDBMSs
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Nectar: automatic management of data and computation in datacenters
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Chukwa: a system for reliable large-scale log collection
LISA'10 Proceedings of the 24th international conference on Large installation system administration
On the expressiveness and trade-offs of large scale tuple stores
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
Scheduling divisible MapReduce computations
Journal of Parallel and Distributed Computing
CPLDP: an efficient large dataset processing system built on cloud platform
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Dremel: interactive analysis of web-scale datasets
Communications of the ACM
ASTERIX: towards a scalable, semistructured data platform for evolving-world models
Distributed and Parallel Databases
Implicitly threaded parallelism in manticore
Journal of Functional Programming
Brasil: basic resource aggregation system infrastructure layer
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Processing theta-joins using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Adapting skyline computation to the MapReduce framework: algorithms and experiments
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Proceedings of the Ninth International Workshop on Dynamic Analysis
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Estimating the number of users behind ip addresses for combating abusive traffic
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
GBASE: a scalable and general graph management system
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
New ideas track: testing mapreduce-style programs
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
The jabberwocky programming environment for structured social computing
Proceedings of the 24th annual ACM symposium on User interface software and technology
Scalable hashing for shared memory supercomputers
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Of hammers and nails: an empirical comparison of three paradigms for processing large graphs
Proceedings of the fifth ACM international conference on Web search and data mining
Case study of scientific data processing on a cloud using hadoop
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
GLADE: a scalable framework for efficient analytics
ACM SIGOPS Operating Systems Review
DVM: towards a datacenter-scale virtual machine
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
A universal calculus for stream processing languages
ESOP'10 Proceedings of the 19th European conference on Programming Languages and Systems
PerfXplain: debugging MapReduce job performance
Proceedings of the VLDB Endowment
Trust me, i'm partially right: incremental visualization lets analysts explore large datasets faster
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors
Proceedings of the VLDB Endowment
Declarative error management for robust data-intensive applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
An optimization framework for map-reduce queries
Proceedings of the 15th International Conference on Extending Database Technology
Swift: A language for distributed parallel scripting
Parallel Computing
From a calculus to an execution environment for stream processing
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Early accurate results for advanced analytics on MapReduce
Proceedings of the VLDB Endowment
MapReduce indexing strategies: Studying scalability and efficiency
Information Processing and Management: an International Journal
Riposte: a trace-driven compiler and parallel VM for vector code in R
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Analyzing ultra-large-scale code corpus with boa
Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity
Boa: analyzing ultra-large-scale code corpus
Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity
Data-intensive architecture for scientific knowledge discovery
Distributed and Parallel Databases
gbase: an efficient analysis platform for large graphs
The VLDB Journal — The International Journal on Very Large Data Bases
SCOPE: parallel databases meet MapReduce
The VLDB Journal — The International Journal on Very Large Data Bases
Spotting code optimizations in data-parallel pipelines through PeriSCOPE
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Scripting distributed scientific workflows using Weaver
Concurrency and Computation: Practice & Experience
Coflow: a networking abstraction for cluster applications
Proceedings of the 11th ACM Workshop on Hot Topics in Networks
On-the-fly task execution for speeding up pipelined mapreduce
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process
International Journal of Intelligent Systems
Cogset: a high performance MapReduce engine
Concurrency and Computation: Practice & Experience
Constructing a data accessing layer for in-memory data grid
Proceedings of the Fourth Asia-Pacific Symposium on Internetware
Optimizing budget constrained spend in search advertising
Proceedings of the sixth ACM international conference on Web search and data mining
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
ACM Transactions on Architecture and Code Optimization (TACO)
Invisible loading: access-driven data transfer from raw files into database systems
Proceedings of the 16th International Conference on Extending Database Technology
Proceedings of the 16th International Conference on Extending Database Technology
BigBench: towards an industry standard benchmark for big data analytics
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Quantiles over data streams: an experimental study
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
HyMR: a hybrid MapReduce workflow system
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
MapReduce with communication overlap (MaRCO)
Journal of Parallel and Distributed Computing
Boa: a language and infrastructure for analyzing ultra-large-scale software repositories
Proceedings of the 2013 International Conference on Software Engineering
Answering: techniques and deployment experience
IEEE/ACM Transactions on Networking (TON)
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions
Journal of Grid Computing
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Cloud-aware processing of MapReduce-based OLAP applications
AusPDC '13 Proceedings of the Eleventh Australasian Symposium on Parallel and Distributed Computing - Volume 140
Representing mapreduce optimisations in the nested relational calculus
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Piranha: optimizing short jobs in Hadoop
Proceedings of the VLDB Endowment
SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters
Journal of Parallel and Distributed Computing
Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
A platform for eXtreme analytics
IBM Journal of Research and Development
Hi-index | 0.03 |
Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on. We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new procedural programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design -- including the separation into two phases, the form of the programming language, and the properties of the aggregators -- exploits the parallelism inherent in having data and computation distributed across many machines.