Comparative study of text clustering techniques in virtual worlds

  • Authors:
  • Gema Bello-Orgaz;David Camacho

  • Affiliations:
  • Universidad Autónoma de Madrid, Madrid, Spain;Universidad Autónoma de Madrid, Madrid, Spain

  • Venue:
  • Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Virt-UAM (Virtual Worlds at Universidad Autónoma de Madrid) platform allows to design and implement virtual spaces where a set of avatars can be intensively monitored using a set of tools which can be managed by an administrator. In a virtual world, the users can move and interact between them with a high degree of freedom. The movements, interactions and any other information related to the avatars conversations can be stored. Hence this data is available for processing and analysing to obtain the user behavioural patterns. Document clustering techniques have been intensively applied to automatically organize a document corpus into clusters or similar groups. The topic detection problem can be considered as a special case of document clustering, therefore, these techniques can be used over textual chat to detect clusters from the data, and then extract the conversation topics. Mahout(TM) machine learning library is an Apache(TM) project whose main goal is to build scalable machine learning libraries. This library provides a set of algorithms for data mining and for information retrieval ready to use. This paper shows a practical application of some of these available clustering mahout algorithms, in a virtual world-based scenario. These algorithms have been applied to extract the topics based on clusters obtained from the text messages. Finally, a comparative study of these document clustering algorithms used is presented.