Mining Knowledge from Text Collections Using Automatically Generated Metadata

  • Authors:
  • John M. Pierre

  • Affiliations:
  • -

  • Venue:
  • PAKM '02 Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining is typically applied to large databases of highly structured information in order to discover new knowledge. In businesses and institutions, the amount of information existing in repositories of text documents usually rivals or surpasses the amount found in relational databases. Though the amount of potentially valuable knowledge contained in document collections can be great, they are often difficult to analyze. Therefore, it is important to develop methods to efficiently discover knowledge embedded in these document repositories. In this paper we describe an approach for mining knowledge from text collections by applying data mining techniques to metadata records generated via automated text categorization. By controlling the set of metadata fields as well as the set of assigned categories we can customize the knowledge discovery task to address specific questions. As an example, we apply the approach to a large collection of product reviews and evaluate the performance of the knowledge discovery.