A corpus-based statistical approach to automatic book indexing

  • Authors:
  • Jyun-Sheng Chang;Tsung-Yih Tseng;Ying Cheng;Huey-Chyun Chen;Shun-Der Cheng;Sur-Jin Ker;John S. Liu

  • Affiliations:
  • National Tsing Hua University, Hsinchu, Taiwan, ROC;National Tsing Hua University, Hsinchu, Taiwan, ROC;National Tsing Hua University, Hsinchu, Taiwan, ROC;National Tsing Hua University, Hsinchu, Taiwan, ROC;National Tsing Hua University, Hsinchu, Taiwan, ROC;SooChow University;Sampo Research Institute

  • Venue:
  • ANLC '92 Proceedings of the third conference on Applied natural language processing
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper reports on a new approach to automatic generation of back-of-book indexes for Chinese books. Parsing on the level of complete sentential analysis is avoided because of the inefficiency and unavailability of a Chinese Grammar with enough coverage. Instead, fundamental analysis particular to Chinese text called word segmentation is performed to break up characters into a sequence of lexical units equivalent to words in English. The sequence of words then goes through part-of-speech tagging and noun phrase analysis. All these analyses are done using a corpus-based statistical algorithm. Experimental results have shown satisfactory results.