Clustering of Short Strings in Large Databases

  • Authors:
  • Michail Kazimianec;Arturas Mazeika

  • Affiliations:
  • -;-

  • Venue:
  • DEXA '09 Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A novel method CLOSS intended for textual databases is proposed. It successfully identifies misspelled string clusters, even if the cluster border is not prominent. The method uses q-gram approach to represent data and a string proximity graph to find the cluster. Contribution refers to short string clustering in text mining, when the proximity graph has multiple horizontal lines or the line is not present.