Analyzing document collections via context-aware term extraction

Authors:
Daniel A. Keim;Daniela Oelke;Christian Rohrdantz
Affiliations:
University of Konstanz, Germany;University of Konstanz, Germany;University of Konstanz, Germany
Venue:
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Year:
2009

Citing 13
Cited 3

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
The ecological approach to text visualization

Journal of the American Society for Information Science - Speical issue on integrating mutiple overlapping metadata standards
A vector space model for automatic indexing

Communications of the ACM
Text Mining at the Term Level

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Evaluating Keyword Selection Methods for WEBSOM Text Archives

IEEE Transactions on Knowledge and Data Engineering
A cross-collection mixture model for comparative text mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Identification of relevant terms to support the construction of domain ontologies

HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
A mixture model for contextual text mining

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Contrastive summarization: an experiment with consumer reviews

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Domain relevance on term weighting

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Interactive graph matching and visual comparison of graphs and clustered graphs

Proceedings of the International Working Conference on Advanced Visual Interfaces
A high performance centroid-based classification approach for language identification

Pattern Recognition Letters
Chinese text classification based on neural network

ISNN'13 Proceedings of the 10th international conference on Advances in Neural Networks - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In large collections of documents that are divided into predefined classes, the differences and similarities of those classes are of special interest. This paper presents an approach that is able to automatically extract terms from such document collections which describe what topics discriminate a single class from the others (discriminating terms) and which topics discriminate a subset of the classes against the remaining ones (overlap terms). The importance for real world applications and the effectiveness of our approach are demonstrated by two out of practice examples. In a first application our predefined classes correspond to different scientific conferences. By extracting terms from collections of papers published on these conferences, we determine automatically the topical differences and similarities of the conferences. In our second application task we extract terms out of a collection of product reviews which show what features reviewers commented on. We get these terms by discriminating the product review class against a suitable counter-balance class. Finally, our method is evaluated comparing it to alternative approaches.