K-tree: large scale document clustering

Authors:
Christopher M. De Vries;Shlomo Geva
Affiliations:
Queensland University of Technology, Brisbane, Australia;Queensland University of Technology, Brisbane, Australia
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 2
Cited 1

BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
The Wikipedia XML corpus

ACM SIGIR Forum

Clustering with random indexing K-tree and XML structure

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.