LAPACK: a portable linear algebra library for high-performance computers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
A Backward Slicing Algorithm for Prolog
SAS '96 Proceedings of the Third International Symposium on Static Analysis
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Graph Clustering Via a Discrete Uncoupling Process
SIAM Journal on Matrix Analysis and Applications
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce
Proceedings of the 19th international conference on World wide web
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Linear algebraic primitives for parallel computing on large graphs
Linear algebraic primitives for parallel computing on large graphs
HAMA: An Efficient Matrix Computation with the MapReduce Framework
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
PEGASUS: mining peta-scale graphs
Knowledge and Information Systems - Special Issue: Best Papers of the Fifth International Conference on Advanced Data Mining and Applications (ADMA 2009)
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Regularized latent semantic indexing
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Empirical analysis of predictive algorithms for collaborative filtering
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Using R for iterative and incremental processing
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Sparkler: supporting large-scale matrix factorization
Proceedings of the 16th International Conference on Extending Database Technology
TimeStream: reliable stream computation in the cloud
Proceedings of the 8th ACM European Conference on Computer Systems
Optimus: a dynamic rewriting framework for data-parallel execution plans
Proceedings of the 8th ACM European Conference on Computer Systems
Presto: distributed machine learning and graph processing with sparse matrices
Proceedings of the 8th ACM European Conference on Computer Systems
Large-scale computation not at the cost of expressiveness
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Composition and reuse with compiled domain-specific languages
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Hi-index | 0.00 |
The computation core of many data-intensive applications can be best expressed as matrix computations. The MadLINQ project addresses the following two important research problems: the need for a highly scalable, efficient and fault-tolerant matrix computation system that is also easy to program, and the seamless integration of such specialized execution engines in a general purpose data-parallel computing system. MadLINQ exposes a unified programming model to both matrix algorithm and application developers. Matrix algorithms are expressed as sequential programs operating on tiles (i.e., sub-matrices). For application developers, MadLINQ provides a distributed matrix computation library for .NET languages. Via the LINQ technology, MadLINQ also seamlessly integrates with DryadLINQ, a data-parallel computing system focusing on relational algebra. The system automatically handles the parallelization and distributed execution of programs on a large cluster. It outperforms current state-of-the-art systems by employing two key techniques, both of which are enabled by the matrix abstraction: exploiting extra parallelism using fine-grained pipelining and efficient on-demand failure recovery using a distributed fault-tolerant execution engine. We describe the design and implementation of MadLINQ and evaluate system performance using several real-world applications.