Analysis of the impact of memory in distributed parallel processing systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Programming with POSIX threads
Programming with POSIX threads
Analysis of optimal scheduling in distributed parallel queueing systems
ICCC '95 Proceedings of the 12th international conference on computer communication on Information highways : for a smaller world and better living: for a smaller world and better living
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Parallel programming in OpenMP
Parallel programming in OpenMP
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Database Management Systems
FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World
Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
SPADE: the system s declarative stream processing engine
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Fast support vector machine training and classification on graphics processors
Proceedings of the 25th international conference on Machine learning
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Overview of sciDB: large scale array storage, processing and analysis
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
DSLs in Action
ASTERIX: towards a scalable, semistructured data platform for evolving-world models
Distributed and Parallel Databases
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Large-scale distributed non-negative sparse coding and sparse dictionary learning
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Massive-scale analytics (MSA) applications are characterized by the large amount of data that they process and the complexity of algorithms used to process the data. The ideal MSA system will not only support processing of large amounts of data but also offer a high degree of parallelism and support scheduling and resource allocation of complex workloads. Designers of MSA systems must provide three necessities: programming abstractions, runtime systems, and hardware. Historically, two communities have undertaken the task of designing MSA systems: the database community, which has argued for an SQL (Structured Query Language)-influenced processing paradigm, and the high-performance computing community, which has focused on developing infrastructures for highly efficient, but complex, parallel implementations. These two communities have developed disparate technologies to meet the necessities of MSA systems, and the solutions provided by the individual communities are not completely satisfactory. In this paper, we attempt to characterize the strengths and weaknesses of the approaches of these two communities at all levels of the MSA stack, characterize implications with respect to resource management within the MSA system, and define how an MSA system should be designed.