An online document clustering technique for short web contents

  • Authors:
  • Moreno Carullo;Elisabetta Binaghi;Ignazio Gallo

  • Affiliations:
  • Universití degli Studi dell'Insubria, Dipartimento di Informatica e Comunicazione, 21100 Varese, Italy;Universití degli Studi dell'Insubria, Dipartimento di Informatica e Comunicazione, 21100 Varese, Italy;Universití degli Studi dell'Insubria, Dipartimento di Informatica e Comunicazione, 21100 Varese, Italy

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2009

Quantified Score

Hi-index 0.10

Visualization

Abstract

Document clustering techniques have been applied in several areas, with the web as one of the most recent and influential. Both general-purpose and text-oriented techniques exist and can be used to cluster a collection of documents in many ways. This work proposes a novel heuristic online document clustering model that can be specialized with a variety of text-oriented similarity measures. An experimental evaluation of the proposed model was conducted in the e-commerce domain. Performances were measured using a clustering-oriented metric based on F-Measure and compared with those obtained by other well-known approaches. The obtained results confirm the validity of the proposed method both for batch scenarios and online scenarios where document collections can grow over time.