A multi-layered summarization system for multi-media archives by understanding and structuring of chinese spoken documents

Authors:
Lin-shan Lee;Sheng-yi Kong;Yi-cheng Pan;Yi-sheng Fu;Yu-tsun Huang;Chien-Chih Wang
Affiliations:
Speech Lab, College of EECS National Taiwan University, Taipei;Speech Lab, College of EECS National Taiwan University, Taipei;Speech Lab, College of EECS National Taiwan University, Taipei;Speech Lab, College of EECS National Taiwan University, Taipei;Speech Lab, College of EECS National Taiwan University, Taipei;Speech Lab, College of EECS National Taiwan University, Taipei
Venue:
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Year:
2006

Citing 7
Cited 0

SCAN: designing and evaluating user interfaces to support retrieval from speech archives

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A practical web-based approach to generating topic hierarchy for text segments

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Automatic title generation for spoken broadcast news

HLT '01 Proceedings of the first international conference on Human language technology research
Headline generation based on statistical translation

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Hedge Trimmer: a parse-and-trim approach to headline generation

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Self organization of a massive document collection

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The multi-media archives are very difficult to be shown on the screen, and very difficult to retrieve and browse. It is therefore important to develop technologies to summarize the entire archives in the network content to help the user in browsing and retrieval. In a recent paper [1] we proposed a complete set of multi-layered technologies to handle at least some of the above issues: (1) Automatic Generation of Titles and Summaries for each of the spoken documents, such that the spoken documents become much more easier to browse, (2) Global Semantic Structuring of the entire spoken document archive, offering to the user a global picture of the semantic structure of the archive, and (3) Query-based Local Semantic Structuring for the subset of the spoken documents retrieved by the user’s query, providing the user the detailed semantic structure of the relevant spoken documents given the query he entered. The Probabilistic Latent Semantic Analysis (PLSA) is found to be helpful. This paper presents an initial prototype system for Chinese archives with the functions mentioned above, in which the broadcast news archive in Mandarin Chinese is taken as the example archive.