Algorithms for clustering data
Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Discovering typical structures of documents: a road map approach
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
XClust: clustering XML schemas for effective integration
Proceedings of the eleventh international conference on Information and knowledge management
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TreeFinder: a First Step towards XML Data Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
XRules: an effective structural classifier for XML data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Human-Computer Interactive Method for Projected Clustering
IEEE Transactions on Knowledge and Data Engineering
BIDE: Efficient Mining of Frequent Closed Sequences
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Clustering XML documents using structural summaries
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach
Focused Access to XML Documents
CONTOUR: an efficient algorithm for discovering discriminating subsequences
Data Mining and Knowledge Discovery
A schema matching-based approach to XML schema clustering
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Incremental sequence-based frequent query pattern mining from XML queries
Data Mining and Knowledge Discovery
HCX: an efficient hybrid clustering approach for XML documents
Proceedings of the 9th ACM symposium on Document engineering
A cluster-based approach to XML similarity joins
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
XCFS: an XML documents clustering approach using both the structure and the content
Proceedings of the 18th ACM conference on Information and knowledge management
GConnect: a connectivity index for massive disk-resident graphs
Proceedings of the VLDB Endowment
Return specification inference and result clustering for keyword search on XML
ACM Transactions on Database Systems (TODS)
A weighted common structure based clustering technique for XML documents
Journal of Systems and Software
Improving XML search by generating and utilizing informative result snippets
ACM Transactions on Database Systems (TODS)
Online structural graph clustering using frequent subgraph mining
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Frequent tree pattern mining: A survey
Intelligent Data Analysis
Highly efficient algorithms for structural clustering of large websites
Proceedings of the 20th international conference on World wide web
Multimedia metadata mapping: towards helping developers in their integration task
Proceedings of the 8th International Conference on Advances in Mobile Computing and Multimedia
XML data clustering: An overview
ACM Computing Surveys (CSUR)
XML documents clustering using a tensor space model
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Collaborative clustering of XML documents
Journal of Computer and System Sciences
Parallel structural graph clustering
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Mining frequent patterns from XML data: Efficient algorithms and design trade-offs
Expert Systems with Applications: An International Journal
Efficient Mining of Gap-Constrained Subsequences and Its Various Applications
ACM Transactions on Knowledge Discovery from Data (TKDD)
XML document clustering using structure-preserving flat representation of XML content and structure
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
FXProj: a fuzzy XML documents projected clustering based on structure and content
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Exploring dictionary-based semantic relatedness in labeled tree data
Information Sciences: an International Journal
X-Class: Associative Classification of XML Documents by Structure
ACM Transactions on Information Systems (TOIS)
Hierarchical clustering of XML documents focused on structural components
Data & Knowledge Engineering
Discovering interesting information with advances in web technology
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
XML has become a popular method of data representation both on the web and in databases in recent years. One of the reasons for the popularity of XML has been its ability to encode structural information about data records. However, this structural characteristic of data sets also makes it a challenging problem for a variety of data mining problems. One such problem is that of clustering, in which the structural aspects of the data result in a high implicit dimensionality of the data representation. As a result, it becomes more difficult to cluster the data in a meaningful way. In this paper, we propose an effective clustering algorithm for XML data which uses substructures of the documents in order to gain insights about the important underlying structures. We propose new ways of using multiple sub-structuralinformation in XML documents to evaluate the quality of intermediate cluster solutions, and guide the algorithms to a final solution which reflects the true structural behavior in individual partitions. We test the algorithm on a variety of real and synthetic data sets.