Maxdiff kd-trees for data condensation

Authors:
B. L. Narayan;C. A. Murthy;Sankar K. Pal
Affiliations:
Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata 700 108, West Bengal, India;Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata 700 108, West Bengal, India;Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata 700 108, West Bengal, India
Venue:
Pattern Recognition Letters
Year:
2006

Citing 15
Cited 10

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems

Histogram-based estimation techniques in database systems
Prototype selection for the nearest neighbour rule through proximity graphs

Pattern Recognition Letters
Very fast EM-based mixture model clustering using multiresolution kd-trees

Proceedings of the 1998 conference on Advances in neural information processing systems II
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Cure: an efficient clustering algorithm for large databases

Information Systems
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Cached sufficient statistics for efficient machine learning with large datasets

Journal of Artificial Intelligence Research
The condensed nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory
Variable-rate tree-structured vector quantizers

IEEE Transactions on Information Theory

Data utility and privacy protection trade-off in k-anonymisation

PAIS '08 Proceedings of the 2008 international workshop on Privacy and anonymity in information society
Speeding up clustering-based k-anonymisation algorithms with pre-partitioning

BNCOD'07 Proceedings of the 24th British national conference on Databases
A review of instance selection methods

Artificial Intelligence Review
Parametric active membrane for segmentation of multiple objects in an image

Pattern Recognition
Human action recognition in video by 'meaningful' poses

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Application of K-Medoids with Kd-Tree for Software Fault Prediction

ACM SIGSOFT Software Engineering Notes
Modeling sense disambiguation of human pose: recognizing action at a distance by key poses

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part I
An efficient clustering algorithm for k-anonymisation

Journal of Computer Science and Technology
Recognizing interaction between human performers using 'key pose doublet'

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Recognizing facial expressions using a novel shape motion descriptor

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing

Quantified Score

Hi-index	0.10

Visualization

Abstract

Prototype selection on the basis of conventional clustering algorithms results in good representation but is extremely time-taking on large data sets. kd-trees, on the other hand, are exceptionally efficient in terms of time and space requirements for large data sets, but fail to produce a reasonable representation in certain situations. We propose a new algorithm with speed comparable to the present kd-tree based algorithms which overcomes the problems related to the representation for high condensation ratios. It uses the Maxdiff criterion to separate out distant clusters in the initial stages before splitting them any further thus improving on the representation. The splits being axis-parallel, more nodes would be required for the representing a data set which has no regions where the points are well separated.