Interactive thesaurus assessment for automatic document annotation

  • Authors:
  • Kai Eckert;Heiner Stuckenschmidt;Magnus Pfeffer

  • Affiliations:
  • University of Mannheim, Mannheim, Germany;University of Mannheim, Mannheim, Germany;University of Mannheim, Mannheim, Germany

  • Venue:
  • Proceedings of the 4th international conference on Knowledge capture
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of thesaurus-based indexing is a common approach for increasing the performance of document retrieval. With the growing amount of documents available, manual indexing is not a feasible option. Statistical methods for automated document indexing are an attractive alternative. We argue that the quality of the thesaurus used as a basis for indexing in regard to its ability to adequately cover the contents to be indexed is of crucial importance inautomatic indexing because there is no human in the loop that can spot and avoid indexing errors. We propose a method for thesaurus evaluation that is based on a combination of statistical measures and appropriate visualization techniques that supports the detection of potential problems in a thesaurus. We describe this method and show its application in the context of two automatic indexing tasks. The examples show that the methods indeed eases the detection and correction of errors leading to a better indexing result. Please refer to http://www.kaiec.org for high resolution media of all figures used in this paper, as well as an animated presentation of the interactive tool.