Xproj: a framework for projected structural clustering of xml documents

Authors:
Charu C. Aggarwal;Na Ta;Jianyong Wang;Jianhua Feng;Mohammed Zaki
Affiliations:
IBM;Tsinghua University;Tsinghua University;Tsinghua University;Rensselear Polytechnic Institute
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 17
Cited 27

Algorithms for clustering data

Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Comparing Hierarchical Data in External Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Human-Computer Interactive Method for Projected Clustering

IEEE Transactions on Knowledge and Data Engineering
BIDE: Efficient Mining of Frequent Closed Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Clustering XML documents using structural summaries

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology

Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach

Focused Access to XML Documents
CONTOUR: an efficient algorithm for discovering discriminating subsequences

Data Mining and Knowledge Discovery
A schema matching-based approach to XML schema clustering

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Incremental sequence-based frequent query pattern mining from XML queries

Data Mining and Knowledge Discovery
HCX: an efficient hybrid clustering approach for XML documents

Proceedings of the 9th ACM symposium on Document engineering
A cluster-based approach to XML similarity joins

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
XCFS: an XML documents clustering approach using both the structure and the content

Proceedings of the 18th ACM conference on Information and knowledge management
GConnect: a connectivity index for massive disk-resident graphs

Proceedings of the VLDB Endowment
Return specification inference and result clustering for keyword search on XML

ACM Transactions on Database Systems (TODS)
A weighted common structure based clustering technique for XML documents

Journal of Systems and Software
Improving XML search by generating and utilizing informative result snippets

ACM Transactions on Database Systems (TODS)
Online structural graph clustering using frequent subgraph mining

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Frequent tree pattern mining: A survey

Intelligent Data Analysis
Highly efficient algorithms for structural clustering of large websites

Proceedings of the 20th international conference on World wide web
Multimedia metadata mapping: towards helping developers in their integration task

Proceedings of the 8th International Conference on Advances in Mobile Computing and Multimedia
XML data clustering: An overview

ACM Computing Surveys (CSUR)
XML documents clustering using a tensor space model

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Collaborative clustering of XML documents

Journal of Computer and System Sciences
Parallel structural graph clustering

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Mining frequent patterns from XML data: Efficient algorithms and design trade-offs

Expert Systems with Applications: An International Journal
Efficient Mining of Gap-Constrained Subsequences and Its Various Applications

ACM Transactions on Knowledge Discovery from Data (TKDD)
XML document clustering using structure-preserving flat representation of XML content and structure

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
FXProj: a fuzzy XML documents projected clustering based on structure and content

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Exploring dictionary-based semantic relatedness in labeled tree data

Information Sciences: an International Journal
X-Class: Associative Classification of XML Documents by Structure

ACM Transactions on Information Systems (TOIS)
Hierarchical clustering of XML documents focused on structural components

Data & Knowledge Engineering
Discovering interesting information with advances in web technology

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML has become a popular method of data representation both on the web and in databases in recent years. One of the reasons for the popularity of XML has been its ability to encode structural information about data records. However, this structural characteristic of data sets also makes it a challenging problem for a variety of data mining problems. One such problem is that of clustering, in which the structural aspects of the data result in a high implicit dimensionality of the data representation. As a result, it becomes more difficult to cluster the data in a meaningful way. In this paper, we propose an effective clustering algorithm for XML data which uses substructures of the documents in order to gain insights about the important underlying structures. We propose new ways of using multiple sub-structuralinformation in XML documents to evaluate the quality of intermediate cluster solutions, and guide the algorithms to a final solution which reflects the true structural behavior in individual partitions. We test the algorithm on a variety of real and synthetic data sets.