Analyzing document collections via context-aware term extraction

  • Authors:
  • Daniel A. Keim;Daniela Oelke;Christian Rohrdantz

  • Affiliations:
  • University of Konstanz, Germany;University of Konstanz, Germany;University of Konstanz, Germany

  • Venue:
  • NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In large collections of documents that are divided into predefined classes, the differences and similarities of those classes are of special interest. This paper presents an approach that is able to automatically extract terms from such document collections which describe what topics discriminate a single class from the others (discriminating terms) and which topics discriminate a subset of the classes against the remaining ones (overlap terms). The importance for real world applications and the effectiveness of our approach are demonstrated by two out of practice examples. In a first application our predefined classes correspond to different scientific conferences. By extracting terms from collections of papers published on these conferences, we determine automatically the topical differences and similarities of the conferences. In our second application task we extract terms out of a collection of product reviews which show what features reviewers commented on. We get these terms by discriminating the product review class against a suitable counter-balance class. Finally, our method is evaluated comparing it to alternative approaches.