Chinese web comments clustering analysis with a two-phase method

Authors:
YeXin Wang;Li Zhao;Yan Zhang
Affiliations:
Department of Machine Intelligence, Peking University, Beijing, China;Department of Machine Intelligence, Peking University, Beijing, China;Department of Machine Intelligence, Peking University, Beijing, China
Venue:
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Year:
2009

Citing 12
Cited 0

CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining the peanut gallery: opinion extraction and semantic classification of product reviews

WWW '03 Proceedings of the 12th international conference on World Wide Web
Mining and summarizing customer reviews

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An Improved Cluster Labeling Method for Support Vector Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining with Computational Intelligence (Advanced Information and Knowledge Processing)

Data Mining with Computational Intelligence (Advanced Information and Knowledge Processing)
Opinion observer: analyzing and comparing opinions on the Web

WWW '05 Proceedings of the 14th international conference on World Wide Web
Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Movie review mining and summarization

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Topic sentiment mixture: modeling facets and opinions in weblogs

Proceedings of the 16th international conference on World Wide Web
Opinion integration through semi-supervised topic modeling

Proceedings of the 17th international conference on World Wide Web
Mining opinion features in customer reviews

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Usually a meaningful web topic has tens of thousands of comments, especially the hot topics. It is valuable if we congregate the comments into clusters and find out the mainstreams. However, such analysis has two difficulties. First, there is no explicit link relationship between web comments just like those among web pages or Blog comments. The other problem is, most of the comments are very short, even one or two words. Therefore the traditional clustering algorithms such as CURE and DBSCAN cannot work if applied to these comments directly. In this paper we propose a two-phase algorithm, which will first combine the highly synonymous comments into a longer one based on a connected graph model, and then apply the improved clustering methods to the new collections. Experimental results on two real data sets show that our algorithm performs better than traditional algorithms such as CURE.