Summarizing blog entries versus news texts

Authors:
Shamima Mithun;Leila Kosseim
Affiliations:
Concordia University, Montreal, Quebec, Canada;Concordia University, Montreal, Quebec, Canada
Venue:
eETTs '09 Proceedings of the Workshop on Events in Emerging Text Types
Year:
2009

Citing 5
Cited 2

Automated discourse generation using discourse structure relations

Artificial Intelligence - Special volume on natural language processing
Structured use of external knowledge for event-based open domain question answering

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Mining and summarizing customer reviews

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Event-Based Extractive Summarization Using Event Semantic Relevance from External Linguistic Resource

ALPIT '07 Proceedings of the Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007)

Exploiting rhetorical relations in blog summarization

AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Getting emotional about news summarization

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

As more and more people are expressing their opinions on the web in the form of weblogs (or blogs), research on the blogosphere is gaining popularity. As the outcome of this research, different natural language tools such as query-based opinion summarizers have been developed to mine and organize opinions on a particular event or entity in blog entries. However, the variety of blog posts and the informal style and structure of blog entries pose many difficulties for these natural language tools. In this paper, we identify and categorize errors which typically occur in opinion summarization from blog entries and compare blog entry summaries with traditional news text summaries based on these error types to quantify the differences between these two genres of texts for the purpose of summarization. For evaluation, we used summaries from participating systems of the TAC 2008 opinion summarization track and updated summarization track. Our results show that some errors are much more frequent to blog entries (e.g. topic irrelevant information) compared to news texts; while other error types, such as content overlap, seem to be comparable. These findings can be used to prioritize these error types and give clear indications as to where we should put effort to improve blog summarization.