Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Concept decompositions for large sparse text data using clustering
Machine Learning
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
The Weblog Handbook: Practical Advice on Creating and Maintaining Your Blog
The Weblog Handbook: Practical Advice on Creating and Maintaining Your Blog
On the bursty evolution of blogspace
WWW '03 Proceedings of the 12th international conference on World Wide Web
Bridging the Gap: A Genre Analysis of Weblogs
HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 4 - Volume 4
Email Surveillance Using Non-negative Matrix Factorization
Computational & Mathematical Organization Theory
A parallel hybrid web document clustering algorithm and its performance study
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Word usage and posting behaviors: modeling blogs with unobtrusive data collection methods
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Can blog communication dynamics be correlated with stock market activity?
Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Blogosphere: research issues, tools, and applications
ACM SIGKDD Explorations Newsletter
Field independent probabilistic model for clustering multi-field documents
Information Processing and Management: an International Journal
Extraction, characterization and utility of prototypical communication groups in the blogosphere
ACM Transactions on Information Systems (TOIS)
Hierarchical comments-based clustering
Proceedings of the 2011 ACM Symposium on Applied Computing
Clustering weblogs on the basis of a topic detection method
MCPR'10 Proceedings of the 2nd Mexican conference on Pattern recognition: Advances in pattern recognition
Slovak Blog Clustering Enhanced by Mining the Web Comments
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Care to comment?: recommendations for commenting on news stories
Proceedings of the 21st international conference on World Wide Web
Comment-based multi-view clustering of web 2.0 items
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Blogs are a new form of internet phenomenon and a vast everincreasing information resource. Mining blog files for information is a very new research direction in data mining. Blog files are different from standard web files and may need specialized mining strategies. We propose to include the title, body, and comments of the blog pages in clustering datasets from blog documents. In particular, we argue that the author/reader comments of the blog pages may have more discriminating effect in clustering blog documents. We constructed a word-page matrix by downloading blog pages from a well-known website and experimented a k-means clustering algorithm with different weights assigned to the title, body, and comment parts. Our experimental results show that assigning a larger weight value to the blog comments helps the k-means algorithm produce better clustering solutions. The experimental results confirm our hypothesis that the author/reader comments of the blog files are very useful in discriminating blog files.