Conversation map: a content-based Usenet newsgroup browser
Proceedings of the 5th international conference on Intelligent user interfaces
TextTiling: A Quantitative Approach to Discourse
TextTiling: A Quantitative Approach to Discourse
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
A visualized communication system using cross-media semantic association
MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part II
Contextual correlation based thread detection in short text message streams
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Instant intercommunion techniques such as Instant Messaging (IM) are widely popularized. Aiming at such kind of large scale mass-communication media, clustering on its text content is a practical method to analyze the characteristic of text content in instant messages, and find or track the social hot topics. However, key words in one instant message usually are few, even latent; moreover, single message can not describe the conversational context. This is very different from general document and makes common clustering algorithms unsuitable. A novel method called WR-KMeans is proposed, which synthesizes related instant messages as a conversation and enriches conversation's vector by words which are not included in this conversation but are closely related with existing words in this conversation. WR-KMeans performs clustering like k-means on this extended vector space of conversations. Experiments on the public datasets show that WR-KMeans outperforms the traditional k-means and bisecting k-means algorithms.