Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections

  • Authors:
  • Helena Ahonen;Oskari Heinonen;Mika Klemettinen;A. Inkeri Verkamo

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ADL '98 Proceedings of the Advances in Digital Libraries Conference
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditionally, texts have been analysed using various information retrieval related methods, such as full-text analysis, and natural language processing. However, only few examples of data mining in text, particularly in full text, are available.In this paper we show that general data mining methods are applicable to text analysis tasks such as descriptive phrase extraction. Moreover, we present a general framework for text mining. The framework follows the general knowledge discovery process, thus containing steps from preprocessing to the utilization of the results. The data mining method that we apply is based on generalized episodes and episode rules.We give concrete examples of how to preprocess texts based on the intended use of the discovered results and we introduce a weighting scheme that helps in pruning out redundant or non-descriptive phrases. We also present results from real-life data experiments.