Natural language text segmentation techniques applied to the automatic compilation of printed subject indexes and for online database access

  • Authors:
  • G. Vladutz

  • Affiliations:
  • Institute for Scientific Information, Philadelphia, Pennsylvania

  • Venue:
  • ANLC '83 Proceedings of the first conference on Applied natural language processing
  • Year:
  • 1983

Quantified Score

Hi-index 0.00

Visualization

Abstract

The nature of the problem and earlier approaches to the automatic compilation of printed subject indexes are reviewed and illustrated. A simple method is described for the detection of semantically self-contained word phrase segments in title-like texts. The method is based on a predetermined list of acceptable types of nominative syntactic patterns which can be recognized using a small domain-independent dictionary. The transformation of the detected word phrases into subject index records is described. The records are used for the compilation of Key Word Phrase subject indexes (KWPSI). The method has been successfully tested for the fully automatic production of KWPSI-type indexes to titles of scientific publications. The usage of KWPSI-type display formats for the enhanced online access to databases is also discussed.