PG-Skip: proximity graph based clustering of long strings
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Hi-index | 0.00 |
A novel method CLOSS intended for textual databases is proposed. It successfully identifies misspelled string clusters, even if the cluster border is not prominent. The method uses q-gram approach to represent data and a string proximity graph to find the cluster. Contribution refers to short string clustering in text mining, when the proximity graph has multiple horizontal lines or the line is not present.