Semantics-guided clustering of heterogeneous XML schemas

Authors:
Pasquale De Meo;Giovanni Quattrone;Giorgio Terracina;Domenico Ursino
Affiliations:
DIMET, Università Mediterranea di Reggio Calabria, Reggio Calabria, Italy;DIMET, Università Mediterranea di Reggio Calabria, Reggio Calabria, Italy;Dipartimento di Matematica, Università della Calabria, Rende, CS, Italy;DIMET, Università Mediterranea di Reggio Calabria, Reggio Calabria, Italy
Venue:
Journal on data semantics IX
Year:
2007

Citing 36
Cited 2

Efficient algorithms for finding maximum matching in graphs

ACM Computing Surveys (CSUR)
Semantic vs. structural resemblance of classes

ACM SIGMOD Record
WordNet: a lexical database for English

Communications of the ACM
Semantic integration of semistructured and structured data sources

ACM SIGMOD Record
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Data mining: concepts and techniques

Data mining: concepts and techniques
APEX: an adaptive path index for XML data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
Global Viewing of Heterogeneous Data Sources

IEEE Transactions on Knowledge and Data Engineering
Uniform Techniques for Deriving Similarities of Objects and Subschemes in Heterogeneous Databases

IEEE Transactions on Knowledge and Data Engineering
X-Compass: An XML Agent for Supporting User Navigation on the Web

FQAS '02 Proceedings of the 5th International Conference on Flexible Query Answering Systems
Relational Databases for Querying XML Documents: Limitations and Opportunities

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
A Model for XML Schema Integration

EC-WEB '02 Proceedings of the Third International Conference on E-Commerce and Web Technologies
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting Changes in XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
BitCube: A Three-Dimensional Bitmap Indexing for XML Documents

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
A customizable hybrid approach to data clustering

Proceedings of the 2003 ACM symposium on Applied computing
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
MASS: a multi-axis storage structure for large XML documents

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Extraction of Synonymies, Hyponymies, Overlappings and Homonymies from XML Schemas at Various "Serverity" Levels

IDEAS '04 Proceedings of the International Database Engineering and Applications Symposium
LDAP: Framework, Practices, and Trends

IEEE Internet Computing
Organizing structured web sources by query schemas: a clustering approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
XML Clustering by Principal Component Analysis

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Fast Detection of XML Structural Similarity

IEEE Transactions on Knowledge and Data Engineering
A tree-based approach to clustering XML documents by structure

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
A framework for modeling and evaluating automatic semantic reconciliation

The VLDB Journal — The International Journal on Very Large Data Bases
XML Document Indexes: A Classification

IEEE Internet Computing
Integrating Element and Term Semantics for Similarity-Based XML Document Clustering

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
On convergence properties of the em algorithm for gaussian mixtures

Neural Computation
A methodology for clustering XML documents by structure

Information Systems

Semantic clustering of XML documents

ACM Transactions on Information Systems (TOIS)
Exploring dictionary-based semantic relatedness in labeled tree data

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we illustrate an approach for clustering semantically heterogeneous XML Schemas. The proposed approach is driven by the semantics of the involved Schemas that is defined by means of the interschema properties existing among concepts represented therein; interschema properties taken into account by our approach are synonymies (indicating that two concepts have the same meaning), hyponymies (denoting that a concept has a more specific meaning than another one), and overlappings (indicating that two concepts are neither synonyms nor one hyponym of the other, but represent, to some extent, the same reality). An important feature of our approach consists of its capability of being integrated with almost all the clustering algorithms already proposed in the literature. Both a theoretical and an experimental analysis on the complexity of our approach are presented in the paper. They show that our approach is scalable and particularly suited in application contexts characterized by a great number and a large variety of XML Schemas.