Model-based clustering of probability density functions

Authors:
Angela Montanari;Daniela G. Calò
Affiliations:
Department of Statistics, University of Bologna, Bologna, Italy 40126;Department of Statistics, University of Bologna, Bologna, Italy 40126
Venue:
Advances in Data Analysis and Classification
Year:
2013

Citing 16
Cited 0

A Theory for Multiresolution Signal Decomposition: The Wavelet Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Essential wavelets for statistical applications and data analysis

Essential wavelets for statistical applications and data analysis
Estimating the square root of a density via compactly supported wavelets

Computational Statistics & Data Analysis
Concept decompositions for large sparse text data using clustering

Machine Learning
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Using the KL-Center for Efficient and Accurate Retrieval of Distributions Arising from Texture Images

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

The Journal of Machine Learning Research
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,

Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Symbolic Data Analysis and the SODAS Software

Symbolic Data Analysis and the SODAS Software
Dimensionality reduction when data are density functions

Computational Statistics & Data Analysis
Wavelet-based Fuzzy Clustering of Time Series

Journal of Classification
Far beyond the classical data models: symbolic data analysis

Statistical Analysis and Data Mining
Unsupervised clustering of multidimensional distributions using earth mover distance

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum Likelihood Wavelet Density Estimation With Applications to Image and Shape Matching

IEEE Transactions on Image Processing
Copula analysis of mixture models

Computational Statistics
The multivariate Watson distribution: Maximum-likelihood estimation and other aspects

Journal of Multivariate Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Complex data such as those where each statistical unit under study is described not by a single observation (or vector variable), but by a unit-specific sample of several or even many observations, are becoming more and more popular. Reducing these sample data by summary statistics, like the average or the median, implies that most inherent information (about variability, skewness or multi-modality) gets lost. Full information is preserved only if each unit is described by a whole distribution. This new kind of data, a.k.a. "distribution-valued data", require the development of adequate statistical methods. This paper presents a method to group a set of probability density functions (pdfs) into homogeneous clusters, provided that the pdfs have to be estimated nonparametrically from the unit-specific data. Since elements belonging to the same cluster are naturally thought of as samples from the same probability model, the idea is to tackle the clustering problem by defining and estimating a proper mixture model on the space of pdfs. The issue of model building is challenging here because of the infinite-dimensionality and the non-Euclidean geometry of the domain space. By adopting a wavelet-based representation for the elements in the space, the task is accomplished by using mixture models for hyper-spherical data. The proposed solution is illustrated through a simulation experiment and on two real data sets.