Adaptive topological tree structure for document organisation and visualisation

  • Authors:
  • Richard T. Freeman;Hujun Yin

  • Affiliations:
  • Department of Electrical and Electronic Engineering, University of Manchester Institute of Science and Technology, P.O. Box 88, Manchester M60 1QD, UK;Department of Electrical and Electronic Engineering, University of Manchester Institute of Science and Technology, P.O. Box 88, Manchester M60 1QD, UK

  • Venue:
  • Neural Networks - 2004 Special issue: New developments in self-organizing systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The self-organising map (SOM) is finding more and more applications in a wide range of fields, such as clustering, pattern recognition and visualisation. It has also been employed in knowledge management and information retrieval. We propose an alternative to existing 2-dimensional SOM based methods for document analysis. The method, termed Adaptive Topological Tree Structure (ATTS), generates a taxonomy of underlying topics from a set of unclassified, unstructured documents. The ATTS consists of a hierarchy of adaptive self-organising chains, each of which is validated independently using a proposed entropy-based Bayesian information criterion. A node meeting the expansion criterion spans a child chain, with reduced vocabulary and increased specialisation. The ATTS creates a topological tree of topics, which can be browsed like a content hierarchy and reflects the connections between related topics at each level. A review is also given on the existing neural network based methods for document clustering and organisation. Experimental results on real-world datasets using the proposed ATTS method are presented and compared with other approaches. The results demonstrate the advantages of the proposed validation criteria and the efficiency of the ATTS approach for document organisation, visualisation and search. It shows that the proposed methods not only improve the clustering results but also boost the retrieval.