Text mining

  • Authors:
  • Ronen Feldman

  • Affiliations:
  • Director, Data Mining Laboratory, Department of Mathematics and Computer Science, Bar-Ilan University, Ramat-Gan, Israel

  • Venue:
  • Handbook of data mining and knowledge discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The information age is characterized by a rapid growth in the amount of information available in electronic media. Traditional data handling methods are not adequate to cope with this flood of information. Knowledge discovery in databases (KDD) is a new paradigm that focuses on automatic or semiautomatic exploration of large amounts of data and on discovery of relevant and interesting patterns within them. While most work on KDD is concerned with structured databases, it is clear that this paradigm is required for handling the huge amount of information that is available only in unstructured textual form. To apply KDD on texts, it is necessary to impose some structure on the data that would be rich enough to allow for interesting KDD operations. On the other hand, we must consider the severe limitations of current text processing technology and define rather simple structures that can be extracted from texts fairly automatically and at a reasonable cost. One of the options is to use a text categorization/term extraction paradigm to annotate text articles with meaningful concepts that are organized in a hierarchical structure. This relatively simple annotation is rich enough to provide the basis for a novel KDD framework, enabling data summarization, exploration of interesting patterns, and trend analysis.