Ontologies Improve Text Document Clustering

  • Authors:
  • Andreas Hotho;Steffen Staab;Gerd Stumme

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text document clustering plays an important role in providingintuitive navigation and browsing mechanisms by organizinglarge sets of documents into a small number ofmeaningful clusters. The bag of words representation usedfor these clustering methods is often unsatisfactory as it ignoresrelationships between important terms that do not co-occurliterally. In order to deal with the problem, we integratecore ontologies as background knowledge into theprocess of clustering text documents. Our experimentalevaluations compare clustering techniques based on pre-categorizationsof texts from Reuters newsfeeds and on asmaller domain of an eLearning course about Java. In theexperiments, improvements of results by background knowledgecompared to a baseline without background knowledgecan be shown in many interesting combinations.