Evaluation of automatically identified index terms for browsing electronic documents

Authors:
Nina Wacholder;Judith L. Klavans;David K. Evans
Affiliations:
Columbia University;Columbia University;Columbia University
Venue:
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Year:
2000

Citing 7
Cited 4

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Constructing literature abstracts by computer: techniques and prospects

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Building effective queries in natural language information retrieval

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
An automated system that assists in the generation of document indexes

Natural Language Engineering
Noun-phrase analysis in unrestricted text for information retrieval

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics

Automatic identification and organization of index terms for interactive browsing

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
A prototype multilingual document browser for ancient Greek texts

The New Review of Hypermedia and Multimedia
The technology of phrase browsing applications: workshop held in conjunction with the first ACM-IEEE joint conference on digital libraries

ACM SIGIR Forum
Toward a task-based gold standard for evaluation of NP chunks and technical terms

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an evaluation of domainindependent natural language tools for use in the identification of significant concepts in documents. Using qualitative evaluation, we compare three shallow processing methods for extracting index terms, i.e., terms that can be used to model the content of documents. We focus on two criteria: quality and coverage. In terms of quality alone, our results show that technical term (TT) extraction [Justeson and Katz 1995] receives the highest rating. However, in terms of a combined quality and coverage metric, the Head Sorting (HS) method, described in [Wacholder 1998], outperforms both other methods, keyword (KW) and TT.