A K-mixture connective-strength-based approach to automatic text summarisation

Authors:
Te-Min Chang;Wen-Feng Hsiao
Affiliations:
Department of Information Management, National Sun Yat-sen University, 70, Lien-hai Road, Kaohsiung 804, Taiwan.;Department of Information Management, National Pingtung Institute of Commerce, 51 Min-Sheng E. Road, Pingtung 900, Taiwan
Venue:
International Journal of Intelligent Systems Technologies and Applications
Year:
2011

Citing 20
Cited 0

Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Information extraction

Communications of the ACM
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Text mining as integration of several related research areas: report on KDD's workshop on text mining 2000

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Summarization as feature selection for text categorization

Proceedings of the tenth international conference on Information and knowledge management
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
Automatic Detection of Thesaurus relations for Information Retrieval Applications

Foundations of Computer Science: Potential - Theory - Cognition, to Wilfried Brauer on the occasion of his sixtieth birthday
Learning information extraction patterns from examples

Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Topic-conditioned novelty detection

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying topics by position

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Distribution of content words and phrases in text and language modelling

Natural Language Engineering
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A Probabilistic Approach to Multi-document Summarization for Generating a Tiled Summary

ICCIMA '05 Proceedings of the Sixth International Conference on Computational Intelligence and Multimedia Applications
Automatic summarising: The state of the art

Information Processing and Management: an International Journal
The automatic creation of literature abstracts

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

This research focuses on developing a hybrid automatic text summarisation approach, KCS, to enhance the quality of summaries. KCS employs the K-mixture probabilistic model to establish term weight distributions in a statistical sense. It further identifies the lexical relations between nouns and nouns, as well as nouns and verbs to derive the connective strength (CS) of nouns. Sentences are ranked and extracted according to the accumulated CS values they contain. We conduct two experiments to justify the proposed approach. The results show that the K-mixture model itself is more conducive to document classification than traditional TFIDF weighting scheme since the best macro F-measure increases from 0.63 to 0.67. It, however, is still no better than the more complex linguistic-based approach that takes noun's CS into consideration. Most importantly, our proposed approach, KCS, performs best among all approaches considered (with the best macro F-measure of 0.8). It implies that KCS can extract more representative sentences from the document and its feasibility in text summarisation applications is thus justified.