Exploiting index pruning methods for clustering XML collections

Authors:
Ismail Sengor Altingovde;Duygu Atilgan;Özgür Ulusoy
Affiliations:
Department of Computer Engineering, Bilkent University, Ankara, Turkey;Department of Computer Engineering, Bilkent University, Ankara, Turkey;Department of Computer Engineering, Bilkent University, Ankara, Turkey
Venue:
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Year:
2009

Citing 12
Cited 1

Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases

ACM Transactions on Database Systems (TODS)
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Efficiency and effectiveness of query processing in cluster-based retrieval

Information Systems
A document-centric approach to static index pruning in text retrieval systems

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Boosting static pruning of inverted files

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A Practitioner's Guide for Static Index Pruning

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Document Clustering with K-tree

Advances in Focused Retrieval
Clustering XML Documents Using Frequent Subtrees

Advances in Focused Retrieval
Utilizing the Structure and Content Information for XML Document Clustering

Advances in Focused Retrieval
Self Organizing Maps for the Clustering of Large Sets of Labeled Graphs

Advances in Focused Retrieval
Exploiting query views for static index pruning in web search engines

Proceedings of the 18th ACM conference on Information and knowledge management
XML retrieval using pruned element-index files

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Overview of the INEX 2009 XML mining track: clustering and classification of XML documents

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics.