Tree view self-organisation of web content

  • Authors:
  • Richard T. Freeman;Hujun Yin

  • Affiliations:
  • Department of Electrical and Electronic Engineering, Institute of Science and Technology, University of Manchester, Manchester M60 1QD, UK;Department of Electrical and Electronic Engineering, Institute of Science and Technology, University of Manchester, Manchester M60 1QD, UK

  • Venue:
  • Neurocomputing
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

When browsing a large set of unstructured documents, it is advantageous if the documents have been organised and presented in a way that makes navigation efficient, understanding underlying concepts easy and locating related information quickly. This paper proposes a new method termed Treeview self-organising maps (Treeview SOMs) for clustering and organising text documents by means of a series of independently and automatically created, hierarchical one-dimensional SOMs. The method generates a topological taxonomy tree for a set of unstructured text documents in terms of presentation and visualisation. The documents are organised in a hierarchy of dynamically generated and automatically validated topics extracted from the corpus of the documents. The results presented in a labelled tree view, clearly show underlying contents of the documents and can help browsing the document set more efficiently than those of previous work using SOMs or hierarchical clustering methods. A brief overview on general document clustering and a review on SOM-based document analysis methods are also provided together with a comparison among them.