A Novel Chinese Text Summarization Approach Using Sentence Extraction Based on Kernel Words Recognition

  • Authors:
  • Weijie Yang;Ruwei Dai;Xia Cui

  • Affiliations:
  • -;-;-

  • Venue:
  • FSKD '08 Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The continuing growth of World Wide Web and on-line text collections makes a large volume of information available to users. Automatic text summarization helps users to quickly understand the documents. This paper proposes an automated technique for Chinese document summarization based on kernel words recognition and discourse segment extraction. This method can be divided into the following five steps. First, the input articles are annotated by lexical analysis. Second, all focused named entities are recognized using a machine learning method. Third, the input articles are divided into several discourse segments, all kernel words of these segments are extracted by the way of rule-based main verbs recognition, and all relations among entities are extracted. Fourth, all important sentence candidates are ranked based on some rules, and redundant sentences are removed based on kernel words information. Finally, several most important sentences are extracted to compose the summarization according to expected compression ratio, and these important sentences are output using a special document as reference. A series of experiments are performed on two Chinese document collections. The results show the superiority of the proposed technique over reference systems.