Clustering of Short Strings in Large Databases

Authors:
Michail Kazimianec;Arturas Mazeika
Affiliations:
-;-
Venue:
DEXA '09 Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application
Year:
2009

Citing 0
Cited 1

PG-Skip: proximity graph based clustering of long strings

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel method CLOSS intended for textual databases is proposed. It successfully identifies misspelled string clusters, even if the cluster border is not prominent. The method uses q-gram approach to represent data and a string proximity graph to find the cluster. Contribution refers to short string clustering in text mining, when the proximity graph has multiple horizontal lines or the line is not present.