Navigating massive data sets via local clustering

Authors:
Michael E. Houle
Affiliations:
Tokyo Research Laboratory, Shimotsuruma 1623-14, Yamato-shi, Kanagawa-ken 242-8502, Japan
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 14
Cited 11

Randomized algorithms

Randomized algorithms
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Data clustering: a review

ACM Computing Surveys (CSUR)
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Searching in metric spaces

ACM Computing Surveys (CSUR)
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Machine Learning

Machine Learning
Techniques of Cluster Algorithms in Data Mining

Data Mining and Knowledge Discovery
Approximate Nearest Neighbor Searching in Multimedia Databases

Proceedings of the 17th International Conference on Data Engineering
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Similarity-Based Soft Clustering Algorithm for Documents

DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
Approximate similarity retrieval with M-trees

The VLDB Journal — The International Journal on Very Large Data Bases
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing

Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Scaling distributional similarity to large corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The privacy of k-NN retrieval for horizontal partitioned data: new methods and applications

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Approximate searching for distributional similarity

DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets

Knowledge and Information Systems
Geometrical information fusion from WWW and its related information

DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
Can shared-neighbor distances defeat the curse of dimensionality?

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Co-location pattern mining for unevenly distributed data: algorithm, experiments and applications

International Journal of Computational Science and Engineering
Multi-source shared nearest neighbours for multi-modal image clustering

Multimedia Tools and Applications
Quality of similarity rankings in time series

SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Face retrieval in broadcasting news video by fusing temporal and intensity information

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a scalable method for feature extraction and navigation of large data sets by means of local clustering, where clusters are modeled as overlapping neighborhoods. Under the model, intra-cluster association and external differentiation are both assessed in terms of a natural confidence measure. Minor clusters can be identified even when they appear in the intersection of larger clusters. Scalability of local clustering derives from recent generic techniques for efficient approximate similarity search. The cluster overlap structure gives rise to a hierarchy that can be navigated and queried by users. Experimental results are provided for two large text databases.