Clustering and visualization in a multi-lingual multi-document summarization system

Authors:
Hsin-Hsi Chen;June-Jei Kuo;Tsei-Chun Su
Affiliations:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan;Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan;Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Venue:
ECIR'03 Proceedings of the 25th European conference on IR research
Year:
2003

Citing 6
Cited 10

Towards multidocument summarization by reformulation: progress and prospects

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Creating and evaluating multi-document sentence extract summaries

Proceedings of the ninth international conference on Information and knowledge management
Summarizing Similarities and Differences Among Related Documents

Information Retrieval
Building a Chinese-English wordnet for translingual applications

ACM Transactions on Asian Language Information Processing (TALIP)
A multilingual news summarizer

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization

A summarization system for Chinese news from multiple sources

Journal of the American Society for Information Science and Technology
Improving summarization performance by sentence compression: a pilot study

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Translating–transliterating named entities for multilingual information access

Journal of the American Society for Information Science and Technology
A system for query-specific document summarization

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Multidocument Summary Generation: Using Informative and Event Words

ACM Transactions on Asian Language Information Processing (TALIP)
A method for multilingual text mining and retrieval using growing hierarchical self-organizing maps

Journal of Information Science
A cross-lingual framework for web news taxonomy integration

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Multilingual relevant sentence detection using reference corpus

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Cluster labeling for multilingual scatter/gather using comparable corpora

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Sentence clustering via projection over term clusters

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

To measure the similarity of words, sentences, and documents is one of the major issues in multi-lingual multi-document summarization. This paper presents five strategies to compute the multilingual sentence similarity. The experimental results show that sentence alignment without considering the word position or order in a sentence obtains the best performance. Besides, two strategies are proposed for multilingual document clustering. The two-phase strategy (translation after clustering) is better than one-phase strategy (translation before clustering). Translation deferred to sentence clustering, which reduces the propagation of translation errors, is most promising. Moreover, three strategies are proposed to tackle the sentence clustering. Complete link within a cluster has the best performance, however, the subsumption-based clustering has the advantage of lower computation complexity and similar performance. Finally, two visualization models (i.e., focusing and browsing), which consider the users' language preference, are proposed.