Automatic identification and organization of index terms for interactive browsing

Authors:
Nina Wacholder;Dvid K. Evans;Judith L. Klavans
Affiliations:
Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY
Venue:
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Year:
2001

Citing 16
Cited 11

The vocabulary problem in human-system communication

Communications of the ACM
Needs for research in indexing

Journal of the American Society for Information Science
Information extraction

Communications of the ACM
Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Exploiting clustering and phrases for context-based information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Browsing in digital libraries: a phrase-based approach

DL '97 Proceedings of the second ACM international conference on Digital libraries
Comparing noun phrasing techniques for use with medical digital library tools

Journal of the American Society for Information Science - Special topic issue on digital libraries: part 2
Improving browsing in digital libraries with keyphrase indexes

Decision Support Systems - From information retrieval to knowledge management: enabling technologies and best practices
A usability assessment of online indexing structures in the networked environment

Journal of the American Society for Information Science
Automatic abstracting and indexing—survey and recommendations

Communications of the ACM
Indexing Books

Indexing Books
Evaluation of automatically identified index terms for browsing electronic documents

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Building effective queries in natural language information retrieval

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
An automated system that assists in the generation of document indexes

Natural Language Engineering
Noun-phrase analysis in unrestricted text for information retrieval

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
One sense per collocation

HLT '93 Proceedings of the workshop on Human Language Technology

Compound descriptors in context: a matching function for classifications and thesauri

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
A prototype multilingual document browser for ancient Greek texts

The New Review of Hypermedia and Multimedia
DOM-based content extraction of HTML documents

WWW '03 Proceedings of the 12th international conference on World Wide Web
Methods for precise named entity matching in digital collections

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
The technology of phrase browsing applications: workshop held in conjunction with the first ACM-IEEE joint conference on digital libraries

ACM SIGIR Forum
Automating Content Extraction of HTML Documents

World Wide Web
ScentIndex and ScentHighlights: productive reading techniques for conceptually reorganizing subject indexes and highlighting passages

Information Visualization
The influence of indexing practices and weighting algorithms on document spaces

Journal of the American Society for Information Science and Technology
Document keyphrases as subject metadata: incorporating document key concepts in search results

Information Retrieval
Improving XML search by generating and utilizing informative result snippets

ACM Transactions on Database Systems (TODS)
Using natural language processing to assist the visually handicapped in writing compositions

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The potential of automatically generated indexes for information acces s has been recognized for several decades (e.g., Bush 1945 [2], Edmundson and Wyllys 1961 [4]), but the quantity of text and the ambiguity of natural language processing have made progress at this task more difficult than was originally foreseen. Recently, a body of work on development of interactive systems to support phrase browsing has begun to emerge (e.g., Anick and Vaithyanathan 1997 [1], Gutwin et al. [10], Nevill-Manning et al. 1997 [17], Godby and Reighart 1998 [9]). In this paper, we consider two issues related to the use of automatically identified phrases as index terms in a dynamic text browser (DTB), a user-centered system for navigating and browsing index terms: 1) What criteria are useful for assessing the usefulness of automatically identified index terms? and 2) Is the quality of the terms identified by automatic indexing such that they provide useful access to document content? The terms that we focus on have been identified by LinkIT, a software tool for identifying significant topics in text [7]. Over 90% of the terms identified by LinkIT are coherent and therefore merit inclusion in the dynamic text browser. Terms identified by LinkIT are input to Intell-Index, a prototype DTB that supports interactive navigation of index terms. The distinction between phrasal heads (the most important words in a coherent term) and modifiers serves as the basis for a hierarchical organization of terms. This linguistically motivated structure helps users to efficiently browsing and disambiguate terms. We conclude that the approach to information access discussed in this paper is very promising, and also that there is much room for further research. In the meantime, this research is a contribution to the establishment of a solid foundation for assessing the usability of terms in phrase browsing applications.