SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
ACM Transactions on Database Systems (TODS)
Weaving Relations for Cache Performance
Proceedings of the 27th International Conference on Very Large Data Bases
Optimizing database architecture for the new bottleneck: memory access
The VLDB Journal — The International Journal on Very Large Data Bases
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating compression and execution in column-oriented database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Sybase IQ multiplex - designed for analytics
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Optimizing joins in a map-reduce environment
Proceedings of the 13th International Conference on Extending Database Technology
Making cloud intermediate data fault-tolerant
Proceedings of the 1st ACM symposium on Cloud computing
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
The performance of MapReduce: an in-depth study
Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Providing scalable database services on the cloud
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters
IEEE Transactions on Knowledge and Data Engineering
Query optimization for massively parallel data processing
Proceedings of the 2nd ACM Symposium on Cloud Computing
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Clydesdale: structured data processing on hadoop
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Large-scale machine learning at twitter
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Clydesdale: structured data processing on MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
Efficient parallel kNN joins for large data in MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
The equi-join processing and optimization on ring architecture key/value database
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
The unified logging infrastructure for data analytics at Twitter
Proceedings of the VLDB Endowment
Efficient big data processing in Hadoop MapReduce
Proceedings of the VLDB Endowment
Eagle-eyed elephant: split-oriented indexing in Hadoop
Proceedings of the 16th International Conference on Extending Database Technology
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
CARTILAGE: adding flexibility to the Hadoop skeleton
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Scaling big data mining infrastructure: the twitter experience
ACM SIGKDD Explorations Newsletter
Cache conscious star-join in MapReduce environments
Proceedings of the 2nd International Workshop on Cloud Intelligence
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Database research at the National University of Singapore
ACM SIGMOD Record
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this paper, we propose the design of a new cluster-based data warehouse system, LLama, a hybrid data management system which combines the features of row-wise and column-wise database systems. In Llama, columns are formed into correlation groups to provide the basis for the vertical partitioning of tables. Llama employs a distributed file system (DFS) to disseminate data among cluster nodes. Above the DFS, a MapReduce-based query engine is supported. We design a new join algorithm to facilitate fast join processing. We present a performance study on TPC-H dataset and compare Llama with Hive, a data warehouse infrastructure built on top of Hadoop. The experiment is conducted on EC2. The results show that Llama has an excellent load performance and its query performance is significantly better than the traditional MapReduce framework based on row-wise storage.