Clustering with random indexing K-tree and XML structure

Authors:
Christopher M. De Vries;Shlomo Geva;Lance De Vine
Affiliations:
Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia;Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia;Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia
Venue:
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Year:
2009

Citing 6
Cited 1

Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
An elementary proof of a theorem of Johnson and Lindenstrauss

Random Structures & Algorithms
Database-friendly random projections: Johnson-Lindenstrauss with binary coins

Journal of Computer and System Sciences - Special issu on PODS 2001
Distributed representations and nested compositional structure

Distributed representations and nested compositional structure
K-tree: large scale document clustering

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Document Clustering with K-tree

Advances in Focused Retrieval

Overview of the INEX 2009 XML mining track: clustering and classification of XML documents

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the approach taken to the clustering task at INEX 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX 2009 Wikipedia collection. The RI K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies.