Automated discourse generation using discourse structure relations
Artificial Intelligence - Special volume on natural language processing
Structured use of external knowledge for event-based open domain question answering
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Mining and summarizing customer reviews
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
ALPIT '07 Proceedings of the Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007)
Exploiting rhetorical relations in blog summarization
AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Getting emotional about news summarization
Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
As more and more people are expressing their opinions on the web in the form of weblogs (or blogs), research on the blogosphere is gaining popularity. As the outcome of this research, different natural language tools such as query-based opinion summarizers have been developed to mine and organize opinions on a particular event or entity in blog entries. However, the variety of blog posts and the informal style and structure of blog entries pose many difficulties for these natural language tools. In this paper, we identify and categorize errors which typically occur in opinion summarization from blog entries and compare blog entry summaries with traditional news text summaries based on these error types to quantify the differences between these two genres of texts for the purpose of summarization. For evaluation, we used summaries from participating systems of the TAC 2008 opinion summarization track and updated summarization track. Our results show that some errors are much more frequent to blog entries (e.g. topic irrelevant information) compared to news texts; while other error types, such as content overlap, seem to be comparable. These findings can be used to prioritize these error types and give clear indications as to where we should put effort to improve blog summarization.