Algorithms for clustering data
Algorithms for clustering data
An improved spectral graph partitioning algorithm for mapping parallel computations
SIAM Journal on Scientific Computing
Multilevel hypergraph partitioning: application in VLSI domain
DAC '97 Proceedings of the 34th annual Design Automation Conference
Deterministic annealing EM algorithm
Neural Networks
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
Concept decompositions for large sparse text data using clustering
Machine Learning
Mean Shift, Mode Seeking, and Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Amazon.com Recommendations: Item-to-Item Collaborative Filtering
IEEE Internet Computing
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
An Efficient Clustering Algorithm for Market Basket Data Based on Small Large Ratios
COMPSAC '01 Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development
DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Mean Shift Based Clustering in High Dimensions: A Texture Classification Example
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Relationship-Based Clustering and Visualization for High-Dimensional Data Mining
INFORMS Journal on Computing
Probabilistic discovery of overlapping cellular processes and their regulation
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Mining coherent gene clusters from gene-sample-time microarray data
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An objective evaluation criterion for clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A needle in a haystack: local one-class optimization
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Model-based overlapping clustering
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Robust one-class clustering using hybrid global and local search
ICML '05 Proceedings of the 22nd international conference on Machine learning
Estimating the Support of a High-Dimensional Distribution
Neural Computation
2005 Special Issue: Efficient streaming text clustering
Neural Networks - 2005 Special issue: IJCNN 2005
Bregman Bubble Clustering: A Robust, Scalable Framework for Locating Multiple, Dense Regions in Data
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Clustering with Bregman Divergences
The Journal of Machine Learning Research
Relational clustering by symmetric convex coding
Proceedings of the 24th international conference on Machine learning
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
An information-theoretic analysis of hard and soft assignment methods for clustering
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Scale-based clustering using the radial basis function network
IEEE Transactions on Neural Networks
Isolating top-k dense regions with filtration of sparse background
Pattern Recognition Letters
Hi-index | 0.00 |
In classical clustering, each data point is assigned to at least one cluster. However, in many applications only a small subset of the available data is relevant for the problem and the rest needs to be ignored in order to obtain good clusters. Certain nonparametric density-based clustering methods find the most relevant data as multiple dense regions, but such methods are generally limited to low-dimensional data and do not scale well to large, high-dimensional datasets. Also, they use a specific notion of “distance”, typically Euclidean or Mahalanobis distance, which further limits their applicability. On the other hand, the recent One Class Information Bottleneck (OC-IB) method is fast and works on a large class of distortion measures known as Bregman Divergences, but can only find a single dense region. This article presents a broad framework for finding k dense clusters while ignoring the rest of the data. It includes a seeding algorithm that can automatically determine a suitable value for k. When k is forced to 1, our method gives rise to an improved version of OC-IB with optimality guarantees. We provide a generative model that yields the proposed iterative algorithm for finding k dense regions as a special case. Our analysis reveals an interesting and novel connection between the problem of finding dense regions and exponential mixture models; a hard model corresponding to k exponential mixtures with a uniform background results in a set of k dense clusters. The proposed method describes a highly scalable algorithm for finding multiple dense regions that works with any Bregman Divergence, thus extending density based clustering to a variety of non-Euclidean problems not addressable by earlier methods. We present empirical results on three artificial, two microarray and one text dataset to show the relevance and effectiveness of our methods.