The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection

  • Authors:
  • Adrian Miiller;Jochen Dorre

  • Affiliations:
  • -;-

  • Venue:
  • HICSS '99 Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 2 - Volume 2
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text Mining is an active area of research and development, which combines and expands techniques found in related areas like information retrieval, computational linguistics, and data mining to perform an analysis of large corpora of digital documents. This paper describes the TaxGen Text Mining project carried out at the IBM Software Development Lab. at Boeblingen, Germany. The goal of TaxGen was the automatic generation of a taxonomy for a collection of previously unstructured documents, namely a set of 73.000 news wire documents spanning one year.