A document-based data warehousing approach for large scale data mining

  • Authors:
  • Hualei Chai;Gang Wu;Yuan Zhao

  • Affiliations:
  • School of Software, Shanghai Jiao Tong University, Shanghai, China;School of Software, Shanghai Jiao Tong University, Shanghai, China;School of Software, Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining techniques are widely applied and data warehousing is relatively important in this process. Both scalability and efficiency have always been the key issues in data warehousing. Due to the explosive growth of data, data warehousing today is facing tough challenges in these issues and traditional method encounters its bottleneck. In this paper, we present a document-based data warehousing approach. In our approach, the ETL process is carried out through MapReduce framework and the data warehouse is constructed on a distributed, document-oriented database. A case study is given to demonstrate details of the entire process. Comparing with RDBMS based data warehousing, our approach illustrates better scalability, flexibility and efficiency.