Text summarization using a trainable summarizer and latent semantic analysis

Authors:
Jen-Yuan Yeh;Hao-Ren Ke;Wei-Pang Yang;I-Heng Meng
Affiliations:
Department of Computer & Information Science, National Chiao Tung University, 1001 Ta Hsueh Rd., Hsinchu 30050, Taiwan, ROC;Digital Library & Information Section of Library, National Chiao Tung University, 1001 Ta Hsueh Rd., Hsinchu 30050, Taiwan, ROC;Department of Computer & Information Science, National Chiao Tung University, 1001 Ta Hsueh Rd., Hsinchu 30050, Taiwan, ROC;Department of Computer & Information Science, National Chiao Tung University, 1001 Ta Hsueh Rd., Hsinchu 30050, Taiwan, ROC
Venue:
Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
Year:
2005

Citing 24
Cited 31

Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Generating summaries of multiple news articles

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text structuring and summarization

Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Training a selection function for extraction

Proceedings of the eighth international conference on Information and knowledge management
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Efficient text summarization using lexical chains

Proceedings of the 5th international conference on Intelligent user interfaces
Data mining: concepts and techniques

Data mining: concepts and techniques
Korean text summarization using an aggregate similarity

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Summarizing Similarities and Differences Among Related Documents

Information Retrieval
Generic text summarization using relevance measure and latent semantic analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
Document clustering with cluster refinement and model selection capabilities

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised Learning by Probabilistic Latent Semantic Analysis

Machine Learning
The Challenges of Automatic Summarization

Computer
Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis

ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Skimming stories in real time: an experiment in integrated understanding.

Skimming stories in real time: an experiment in integrated understanding.
Identifying topics by position

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Chinese word segmentation without using lexicon and hand-crafted training data

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automated text summarization and the SUMMARIST system

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
A novel word clustering algorithm based on latent semantic analysis

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Using coreference chains for text summarization

CorefApp '99 Proceedings of the Workshop on Coreference and its Applications

An Asian digital libraries perspective

Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
A Novel Partitioning-Based Clustering Method and Generic Document Summarization

WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Investigating sentence weighting components for automatic summarisation

Information Processing and Management: an International Journal
Noise reduction through summarization for Web-page classification

Information Processing and Management: an International Journal
Multidocument Summary Generation: Using Informative and Event Words

ACM Transactions on Asian Language Information Processing (TALIP)
Query-focused multidocument summarization based on hybrid relevance analysis and surface feature salience

SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network

Expert Systems with Applications: An International Journal
GA, MR, FFNN, PNN and GMM based models for automatic text summarization

Computer Speech and Language
Automatic generic document summarization based on non-negative matrix factorization

Information Processing and Management: an International Journal
Gather customer concerns from online product reviews - A text summarization approach

Expert Systems with Applications: An International Journal
User-oriented document summarization through vision-based eye-tracking

Proceedings of the 14th international conference on Intelligent user interfaces
A new sentence similarity measure and sentence based extractive technique for automatic text summarization

Expert Systems with Applications: An International Journal
Update summarization based on novel topic distribution

Proceedings of the 9th ACM symposium on Document engineering
Document summarization using conditional random fields

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Evaluation of video news classification techniques for automatic content personalisation

International Journal of Advanced Media and Communication
Developing a semantic-enable information retrieval mechanism

Expert Systems with Applications: An International Journal
Video news classification for automatic content personalization: a genetic algorithm based approach

Proceedings of the 14th Brazilian Symposium on Multimedia and the Web
Realization of a news dissemination agent based on weighted association rules and text mining techniques

Expert Systems with Applications: An International Journal
Fuzzy swarm diversity hybrid model for text summarization

Information Processing and Management: an International Journal
A hybrid hierarchical model for multi-document summarization

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Ubiquitous Healthcare Service System with Context-awareness Capability: Design and Implementation

Expert Systems with Applications: An International Journal
Mobile merchandise evaluation service using novel information retrieval and image recognition technology

Computer Communications
Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization

Expert Systems with Applications: An International Journal
Social context summarization

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
EM clustering algorithm for automatic text summarization

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Automatic summarization for chinese text using affinity propagation clustering and latent semantic analysis

WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Document summarisation on mobile devices using non-negative matrix factorisation

International Journal of Computer Applications in Technology
An Intelligent Embedded Marketing Service System based on TV apps: Design and implementation through product placement in idol dramas

Expert Systems with Applications: An International Journal
Use of genetic algorithm for cohesive summary extraction to assist reading difficulties

Applied Computational Intelligence and Soft Computing
Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization

Expert Systems with Applications: An International Journal
Extractive single-document summarization based on genetic operators and guided local search

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA + T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence positions are ranked to emphasize the significances of different sentence positions, and (2) the score function is trained by the genetic algorithm (GA) to obtain a suitable combination of feature weights. The second uses latent semantic analysis (LSA) to derive the semantic matrix of a document or a corpus and uses semantic sentence representation to construct a semantic text relationship map. We evaluate LSA + T.R.M. both with single documents and at the corpus level to investigate the competence of LSA in text summarization. The two novel approaches were measured at several compression rates on a data corpus composed of 100 political articles. When the compression rate was 30%, an average f-measure of 49% for MCBA, 52% for MCBA + GA, 44% and 40% for LSA + T.R.M. in single-document and corpus level were achieved respectively.