A Study of Chinese Text Summarization Using Adaptive Clustering of Paragraphs

Authors:
Po Hu;Tingting He;Donghong Ji;Meng Wang
Affiliations:
Central China Normal University;Central China Normal University;Institute for Infocomm Research;Central China Normal University
Venue:
CIT '04 Proceedings of the The Fourth International Conference on Computer and Information Technology
Year:
2004

Citing 0
Cited 6

A Novel Partitioning-Based Clustering Method and Generic Document Summarization

WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Clustering of document collection - A weighting approach

Expert Systems with Applications: An International Journal
Extracting multi-document summarization based on local topics

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 2
Summary of FAQs from a topical forum based on the native composition structure

Expert Systems with Applications: An International Journal
A novel approach for research paper abstracts summarization using cluster based sentence extraction

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Multi-document summarization based on BE-Vector clustering

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic Summarization is an important research issue in natural language processing. This paper presents a special summarization method to generate single-document summary with maximum topic completeness and minimum redundancy. It initially implements the semantic-class-based vector representations of various kinds of linguistic units in a document by means of HowNet (an existing ontology), which can improve the representation quality of traditional term-based vector space model in a certain degree. Then, by adopting K-means clustering algorithm as well as a novel clustering analysis algorithm, we can capture the number of different latent topic regions in a document adaptively. Finally, topic representative sentences are selected from each topic region to form the final summary. In order to evaluate the effectiveness of the proposed summarization method, a novel metric which is known as representation entropy is used for summarization redundancy evaluation. Preliminary experimental results show that the proposed method outperforms the conventional basic summarization method under the evaluation scheme when dealing with diverse genres of Chinese documents with free writing style and flexible topic distribution.