Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
ACM Transactions on Database Systems (TODS)
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Efficiency and effectiveness of query processing in cluster-based retrieval
Information Systems
A document-centric approach to static index pruning in text retrieval systems
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Boosting static pruning of inverted files
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A Practitioner's Guide for Static Index Pruning
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Document Clustering with K-tree
Advances in Focused Retrieval
Clustering XML Documents Using Frequent Subtrees
Advances in Focused Retrieval
Utilizing the Structure and Content Information for XML Document Clustering
Advances in Focused Retrieval
Self Organizing Maps for the Clustering of Large Sets of Labeled Graphs
Advances in Focused Retrieval
Exploiting query views for static index pruning in web search engines
Proceedings of the 18th ACM conference on Information and knowledge management
XML retrieval using pruned element-index files
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Overview of the INEX 2009 XML mining track: clustering and classification of XML documents
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Hi-index | 0.00 |
In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics.