An overview of data warehousing and OLAP technology
ACM SIGMOD Record
Building the Data Warehouse
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A comparison of data warehousing methodologies
Communications of the ACM - The disappearing computer
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
An Efficient Data Mining Framework on Hadoop using Java Persistence API
CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
A Novel Solution of Distributed Memory NoSQL Database for Cloud Computing
ICIS '11 Proceedings of the 2011 10th IEEE/ACIS International Conference on Computer and Information Science
Hi-index | 0.00 |
Data mining techniques are widely applied and data warehousing is relatively important in this process. Both scalability and efficiency have always been the key issues in data warehousing. Due to the explosive growth of data, data warehousing today is facing tough challenges in these issues and traditional method encounters its bottleneck. In this paper, we present a document-based data warehousing approach. In our approach, the ETL process is carried out through MapReduce framework and the data warehouse is constructed on a distributed, document-oriented database. A case study is given to demonstrate details of the entire process. Comparing with RDBMS based data warehousing, our approach illustrates better scalability, flexibility and efficiency.