Semantic clustering of XML documents

Authors:
Andrea Tagarelli;Sergio Greco
Affiliations:
University of Calabria, Italy;University of Calabria, Italy
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2010

Citing 40
Cited 8

Algorithms for clustering data

Algorithms for clustering data
WordNet: a lexical database for English

Communications of the ACM
Lore: a database management system for semistructured data

ACM SIGMOD Record
Storing semistructured data with STORED

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
Modern Information Retrieval

Modern Information Retrieval
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
A semi-structured document model for text mining

Journal of Computer Science and Technology
BitCube: A Three-Dimensional Bitmap Indexing for XML Documents

Journal of Intelligent Information Systems
Tamino - A DBMS designed for XML

Proceedings of the 17th International Conference on Data Engineering
Clustering Transactional Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Relational Databases for Querying XML Documents: Limitations and Opportunities

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Storing and Querying XML Data in Object-Relational DBMSs

EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
TIMBER: A native XML database

The VLDB Journal — The International Journal on Very Large Data Bases
Anatomy of a native XML base management system

The VLDB Journal — The International Journal on Very Large Data Bases
An information-theoretic approach to normal forms for relational and XML data

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
A normal form for XML documents

ACM Transactions on Database Systems (TODS)
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Organizing structured web sources by query schemas: a clustering approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Fast Detection of XML Structural Similarity

IEEE Transactions on Knowledge and Data Engineering
A tree-based approach to clustering XML documents by structure

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Structural Semantic Interconnections: A Knowledge-Based Approach to Word Sense Disambiguation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Integrating Element and Term Semantics for Similarity-Based XML Document Clustering

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Versatile structural disambiguation for semantic-aware applications

Proceedings of the 14th ACM international conference on Information and knowledge management
Introduction to the special issue on XML retrieval

ACM Transactions on Information Systems (TOIS)
Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

ACM SIGIR Forum
Structure and value synopses for XML data graphs

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
SenseRelate targetword: a generalized framework for word sense disambiguation

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Graph connectivity measures for unsupervised word sense disambiguation

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Extended gloss overlaps as a measure of semantic relatedness

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Semantics-guided clustering of heterogeneous XML schemas

Journal on data semantics IX
XCLS: a fast and effective clustering algorithm for heterogenous XML documents

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An approach for clustering semantically heterogeneous XML schemas

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
A flexible structured-based representation for XML document mining

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Transforming XML trees for efficient classification and clustering

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval

Scaling up top-K cosine similarity search

Data & Knowledge Engineering
Collaborative clustering of XML documents

Journal of Computer and System Sciences
Finding association rules in semantic web data

Knowledge-Based Systems
Building data warehouses with semantic web data

Decision Support Systems
XML document clustering using structure-preserving flat representation of XML content and structure

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Exploring dictionary-based semantic relatedness in labeled tree data

Information Sciences: an International Journal
A Knowledge Mining Approach for Effective Customer Relationship Management

International Journal of Knowledge-Based Organizations
Semantic to intelligent web era: building blocks, applications, and current trends

Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dealing with structure and content semantics underlying semistructured documents is challenging for any task of document management and knowledge discovery conceived for such data. In this work we address the novel problem of clustering semantically related XML documents according to their structure and content features. XML features are generated by enriching syntactic with semantic information based on a lexical knowledge base. The backbone of the proposed framework for the semantic clustering of XML documents is a data representation model that exploits the notion of tree tuple to identify semantically cohesive substructures in XML documents and represent them as transactional data. This framework is equipped with two clustering algorithms based on different paradigms, namely centroid-based partitional clustering and frequent-itemset-based hierarchical clustering. An extensive experimental evaluation was conducted on real data sets from various domains, showing the significance of our approach as a solution for the semantic clustering of XML documents.