Instant message clustering based on extended vector space model

Authors:
Le Wang;Yan Jia;Weihong Han
Affiliations:
Computer School, National University of Defense Technology, Changsha, China;Computer School, National University of Defense Technology, Changsha, China;Computer School, National University of Defense Technology, Changsha, China
Venue:
ISICA'07 Proceedings of the 2nd international conference on Advances in computation and intelligence
Year:
2007

Citing 3
Cited 2

Conversation map: a content-based Usenet newsgroup browser

Proceedings of the 5th international conference on Intelligent user interfaces
TextTiling: A Quantitative Approach to Discourse

TextTiling: A Quantitative Approach to Discourse
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research

A visualized communication system using cross-media semantic association

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part II
Contextual correlation based thread detection in short text message streams

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instant intercommunion techniques such as Instant Messaging (IM) are widely popularized. Aiming at such kind of large scale mass-communication media, clustering on its text content is a practical method to analyze the characteristic of text content in instant messages, and find or track the social hot topics. However, key words in one instant message usually are few, even latent; moreover, single message can not describe the conversational context. This is very different from general document and makes common clustering algorithms unsuitable. A novel method called WR-KMeans is proposed, which synthesizes related instant messages as a conversation and enriches conversation's vector by words which are not included in this conversation but are closely related with existing words in this conversation. WR-KMeans performs clustering like k-means on this extended vector space of conversations. Experiments on the public datasets show that WR-KMeans outperforms the traditional k-means and bisecting k-means algorithms.