Prefix-suffix trees: a novel scheme for compact representation of large datasets

Authors:
Radhika M. Pai;V. S. Ananthanarayana
Affiliations:
Manipal Institute of Technology, Manipal;National Institute of Technology Karnataka, Surathkal
Venue:
PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Year:
2007

Citing 6
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Data clustering: a review

ACM Computing Surveys (CSUR)
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Tree structure for efficient data mining using rough sets

Pattern Recognition Letters - Special issue: Rough sets, pattern recognition and data mining
Cached sufficient statistics for efficient machine learning with large datasets

Journal of Artificial Intelligence Research
Growing subspace pattern recognition methods and their neural-network models

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper we propose a novel scheme called Prefix-Suffix trees for compact storage of patterns in data mining, which forms an abstraction of the patterns, and which is generated from the data in a single scan. This abstraction takes less amount of space and hence forms a compact storage of patterns. Further, we propose a clustering algorithm based on this storage and prove experimentally that this type of storage reduces the space and time. This has been established by considering large data sets of handwritten numerals namely the OCR data, the MNIST data and the USPS data. The proposed algorithm is compared with other similar algorithms and the efficacy of our scheme is thus established.