Fragmenting very large XML data warehouses via K-means clustering algorithm

Authors:
Alfredo Cuzzocrea;Jerome Darmont;Hadj Mahboubi
Affiliations:
ICAR-CNR and University of Calabria, Via P. Bucci, 41C, Rende, 87036 Cosenza, Italy.;University of Lyon (ERIC Lyon 2), 5 avenue Pierre Mendes-France, 69676 Bron Cedex, France.;University of Lyon (ERIC Lyon 2), 5 avenue Pierre Mendes-France, 69676 Bron Cedex, France
Venue:
International Journal of Business Intelligence and Data Mining
Year:
2009

Citing 31
Cited 3

Distributed databases principles and systems

Distributed databases principles and systems
Vertical partitioning algorithms for database design

ACM Transactions on Database Systems (TODS)
Vertical partitioning for database design: a graphical algorithm

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A horizontal fragmentation algorithm for the fact relation in a distributed data warehouse

Proceedings of the eighth international conference on Information and knowledge management
Data warehouse design from XML sources

Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
An adaptive peer-to-peer network for distributed caching of OLAP results

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Curio: A Novel Solution for Efficient Storage and Indexing in Data Warehouses

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Automated Selection of Materialized Views and Indexes in SQL Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
eXist: An Open Source Native XML Database

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
XML Data Warehouse: Modelling and Querying

Proceedings of the Baltic Conference, BalticDB&IS 2002 - Volume 1
TIMBER: A native XML database

The VLDB Journal — The International Journal on Very Large Data Bases
Tree logical classes for efficient evaluation of XQuery

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
XPath lookup queries in P2P networks

Proceedings of the 6th annual ACM international workshop on Web information and data management
X-warehouse: building query pattern-driven data

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Extending XQuery for analytics

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A Model for Distributing and Querying a Data Warehouse on a Computing Grid

ICPADS '05 Proceedings of the 11th International Conference on Parallel and Distributed Systems - Volume 01
Progressive Clustering for Database Distribution on a Grid

ISPDC '05 Proceedings of the The 4th International Symposium on Parallel and Distributed Computing
Expressing OLAP operators with the TAX XML algebra

DataX '08 Proceedings of the 2008 EDBT workshop on Database technologies for handling XML information on the web
Efficient Fragmentation of Large XML Documents

DEXA '07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Enhancing XML data warehouse query performance by fragmentation

Proceedings of the 2009 ACM symposium on Applied Computing
Sedna: a native XML DBMS

SOFSEM'06 Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
Data warehouses in grids with high qos

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
XML-OLAP: a multidimensional analysis framework for XML warehouses

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
An evolutionary approach to schema partitioning selection in a data warehouse

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
X-warehousing: an XML-based approach for warehousing complex data

ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Clustering-based materialized view selection in data warehouses

ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Efficiently processing XML queries over fragmented repositories with partix

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology

F&A: a methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Query optimization over parallel relational data warehouses in distributed environments by simultaneous fragmentation and allocation

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Constrained co-clustering with non-negative matrix factorisation

International Journal of Business Intelligence and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML data sources are gaining popularity in the context of Business Intelligence and On-Line Analytical Processing (OLAP) applications, due to the amenities of XML in representing and managing complex and heterogeneous data. However, XML-native database systems currently suffer from limited performance, both in terms of volumes of manageable data and query response time. Therefore, recent research efforts are focusing on horizontal fragmentation techniques, which are able to overcome the above limitations. However, classical fragmentation algorithms are not suitable to control the number of originated fragments, which instead plays a critical role in data warehouses. In this paper, we propose the use of the K-means clustering algorithm for effectively and efficiently supporting the fragmentation of very large XML data warehouses. We complement our analytical contribution with a comprehensive experimental assessment where we compare the efficiency of our proposal against existing fragmentation algorithms.