A comparative analysis of methodologies for database schema integration
ACM Computing Surveys (CSUR)
Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Incremental clustering for dynamic information processing
ACM Transactions on Information Systems (TOIS)
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Data & Knowledge Engineering
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
A vector space model for automatic indexing
Communications of the ACM
Comparative analysis of six XML schema languages
ACM SIGMOD Record
Modern Information Retrieval
XClust: clustering XML schemas for effective integration
Proceedings of the eleventh international conference on Information and knowledge management
Evaluation of hierarchical clustering algorithms for document datasets
Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
BitCube: A Three-Dimensional Bitmap Indexing for XML Documents
Journal of Intelligent Information Systems
Journal of Intelligent Information Systems
IEEE Internet Computing
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Relational Databases for Querying XML Documents: Limitations and Opportunities
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
DS '00 Proceedings of the Third International Conference on Discovery Science
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
Information Systems - Special issue on web data integration
RPE query processing and optimization techniques for XML databases
Journal of Computer Science and Technology
Efficient Disk-Based K-Means Clustering for Relational Databases
IEEE Transactions on Knowledge and Data Engineering
Incremental Clustering and Dynamic Information Retrieval
SIAM Journal on Computing
Measuring similarity between collection of values
Proceedings of the 6th annual ACM international workshop on Web information and data management
Fast Detection of XML Structural Similarity
IEEE Transactions on Knowledge and Data Engineering
Knowledge and Information Systems
XML for Bioinformatics
Similarity evaluation on tree-structured data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A survey on tree edit distance and related problems
Theoretical Computer Science
Schema matching for transforming structured documents
Proceedings of the 2005 ACM symposium on Document engineering
Finding Syntactic Similarities Between XML Documents
DEXA '06 Proceedings of the 17th International Conference on Database and Expert Systems Applications
eTuner: tuning schema matching software using synthetic scenarios
The VLDB Journal — The International Journal on Very Large Data Bases
Querying XML,: XQuery, XPath, and SQL/XML in context (The Morgan Kaufmann Series in Data Management Systems) (The Morgan Kaufmann Series in Data Management Systems)
A clustering method based on path similarities of XML data
Data & Knowledge Engineering
XML schema clustering with semantic and hierarchical similarity measures
Knowledge-Based Systems
Xproj: a framework for projected structural clustering of xml documents
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficiently Querying Large XML Data Repositories: A Survey
IEEE Transactions on Knowledge and Data Engineering
Structure-based inference of xml similarity for fuzzy duplicate detection
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A novel method for measuring semantic similarity for XML schema matching
Expert Systems with Applications: An International Journal
Fast and effective clustering of XML data using structural information
Knowledge and Information Systems
Measuring the structural similarity among XML documents and DTDs
Journal of Intelligent Information Systems
Schema mapping verification: the spicy way
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
XEdge: clustering homogeneous and heterogeneous XML documents using edge summaries
Proceedings of the 2008 ACM symposium on Applied computing
Communications of the ACM - Web science
Introduction to Information Retrieval
Introduction to Information Retrieval
Matching XML documents in highly dynamic applications
Proceedings of the eighth ACM symposium on Document engineering
PORSCHE: Performance ORiented SCHEma mediation
Information Systems
An Entropy-Based Characterization of the Heterogeneity of XML Collections
DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
XML Data Integration Based on Content and Structure Similarity Using Keys
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
CONTOUR: an efficient algorithm for discovering discriminating subsequences
Data Mining and Knowledge Discovery
A schema matching-based approach to XML schema clustering
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Learning element similarity matrix for semi-structured document analysis
Knowledge and Information Systems
Improving XML schema matching performance using Prüfer sequences
Data & Knowledge Engineering
A methodology for clustering XML documents by structure
Information Systems
Structural similarity evaluation between XML documents and DTDs
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Semantic matching: algorithms and implementation
Journal on data semantics IX
Transforming XML trees for efficient classification and clustering
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
ArHeX: an approximate retrieval system for highly heterogeneous XML document collections
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Dynamic approach for integrating web data warehouses
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
LAX: an efficient approximate XML join based on clustered leaf nodes for XML data integration
BNCOD'05 Proceedings of the 22nd British National conference on Databases: enterprise, Skills and Innovation
An overview of web data clustering practices
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Structural similarity mining in semi-structured microarray data for efficient storage construction
OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part I
Survey: An overview on XML similarity: Background, current trends and future directions
Computer Science Review
Web mining in soft computing framework: relevance, state of the art and future directions
IEEE Transactions on Neural Networks
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Combining structure and content similarities for XML document clustering
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
A change detection system for unordered XML data using a relational model
Data & Knowledge Engineering
Hierarchical clustering of XML documents focused on structural components
Data & Knowledge Engineering
Hi-index | 0.00 |
In the last few years we have observed a proliferation of approaches for clustering XML documents and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the clustering of XML data. These applications need data in the form of similar contents, tags, paths, structures, and semantics. In this article, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying on the abstract representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. In this presentation, we aim to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering component. Finally, the article moves into the description of future trends and research issues that still need to be faced.