Enhancing clustering blog documents by utilizing author/reader comments

Authors:
Beibei Li;Shuting Xu;Jun Zhang
Affiliations:
University of Kentucky, Lexington, KY;Virginia State University, Petersburg, VA;University of Kentucky, Lexington, KY
Venue:
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Year:
2007

Citing 8
Cited 12

Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Concept decompositions for large sparse text data using clustering

Machine Learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
The Weblog Handbook: Practical Advice on Creating and Maintaining Your Blog

The Weblog Handbook: Practical Advice on Creating and Maintaining Your Blog
On the bursty evolution of blogspace

WWW '03 Proceedings of the 12th international conference on World Wide Web
Bridging the Gap: A Genre Analysis of Weblogs

HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 4 - Volume 4
Email Surveillance Using Non-negative Matrix Factorization

Computational & Mathematical Organization Theory
A parallel hybrid web document clustering algorithm and its performance study

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications

Word usage and posting behaviors: modeling blogs with unobtrusive data collection methods

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Can blog communication dynamics be correlated with stock market activity?

Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Blogosphere: research issues, tools, and applications

ACM SIGKDD Explorations Newsletter
Projective ART with buffers for the high dimensional space clustering and an application to discover stock associations

Neurocomputing
Field independent probabilistic model for clustering multi-field documents

Information Processing and Management: an International Journal
Extraction, characterization and utility of prototypical communication groups in the blogosphere

ACM Transactions on Information Systems (TOIS)
Hierarchical comments-based clustering

Proceedings of the 2011 ACM Symposium on Applied Computing
Clustering weblogs on the basis of a topic detection method

MCPR'10 Proceedings of the 2nd Mexican conference on Pattern recognition: Advances in pattern recognition
Slovak Blog Clustering Enhanced by Mining the Web Comments

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Modeling and predicting the popularity of online contents with Cox proportional hazard regression model

Neurocomputing
Care to comment?: recommendations for commenting on news stories

Proceedings of the 21st international conference on World Wide Web
Comment-based multi-view clustering of web 2.0 items

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Blogs are a new form of internet phenomenon and a vast everincreasing information resource. Mining blog files for information is a very new research direction in data mining. Blog files are different from standard web files and may need specialized mining strategies. We propose to include the title, body, and comments of the blog pages in clustering datasets from blog documents. In particular, we argue that the author/reader comments of the blog pages may have more discriminating effect in clustering blog documents. We constructed a word-page matrix by downloading blog pages from a well-known website and experimented a k-means clustering algorithm with different weights assigned to the title, body, and comment parts. Our experimental results show that assigning a larger weight value to the blog comments helps the k-means algorithm produce better clustering solutions. The experimental results confirm our hypothesis that the author/reader comments of the blog files are very useful in discriminating blog files.