Neural Network Based Document Clustering Using WordNet Ontologies

  • Authors:
  • Chihli Hung;Stefan Wermter

  • Affiliations:
  • De Lin Institute of Technology, Taiwan. chihli@mail.educities.edu.tw;Centre for Hybrid Intelligent Systems School of Computing and Technology University of Sunderland, UK. stefan.wermter@sunderland.ac.uk

  • Venue:
  • International Journal of Hybrid Intelligent Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Three novel text vector representation approaches for neural network based document clustering are proposed. The first is the extended significance vector model (ESVM), the second is the hypernym significance vector model (HSVM) and the last is the hybrid vector space model (HyM). ESVM extracts the relationship between words and their preferred classified labels. HSVM exploits a semantic relationship from the WordNet ontology. A more general term, the hypernym, substitutes for terms with similar concepts. This hypernym semantic relationship supplements the neural model in document clustering. HyM is a combination of a TFxIDF vector and a hypernym significance vector, which combines the advantages and reduces the disadvantages from both unsupervised and supervised vector representation approaches. According to our experiments, the self-organising map (SOM) model based on the HyM text vector representation approach is able to improve classification accuracy and to reduce the average quantization error (AQE) on 10,000 full-text articles.