Business, Culture, Politics, and Sports - How to Find Your Way through a Bulk of News? On Content-Based Hierarchical Structuring and Organization of Large Document Archives

  • Authors:
  • Michael Dittenbach;Andreas Rauber;Dieter Merkl

  • Affiliations:
  • -;-;-

  • Venue:
  • DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the increasing amount of information available in electronic document collections, methods for organizing these collections to allow topic-oriented browsing and orientation gain increasing importance. The SOMLib digital library system provides such an organization based on the Self-Organizing Map, a popular neural network model by producing a map of the document space. However, hierarchical relations between documents are hidden in the display. Moreover, with increasing size of document archives the required maps grow larger, thus leading to problems for the user in finding proper orientation within the map. In this case, a hierarchically structured representation of the document space would be highly preferable. In this paper, we present the Growing Hierarchical Self-Organizing Map, a dynamically growing neural network model, providing a content-based hierarchical decomposition and organization of document spaces. This architecture evolves into a hierarchical structure according to the requisites of the input data during an unsupervised training process. A recent enhancement of the training process further ensures proper orientation of the various topical partitions. This facilitates intuitive navigation between neighboring topical branches. The benefits of this approach are shown by organizing a real-world document collection according to semantic similarities.