From data mining to knowledge discovery: an overview
Advances in knowledge discovery and data mining
Large-Scale Parallel Data Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 1)-Volume 1 - Volume 1
Accurate decision trees for mining high-speed data streams
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The uniqueness of a good optimum for K-means
ICML '06 Proceedings of the 23rd international conference on Machine learning
Mining for misconfigured machines in grid systems
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Sequential Change Detection on Data Streams
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Clustering by soft-constraint affinity propagation
Bioinformatics
Journal of Global Optimization
Adaptive diagnosis in distributed systems
IEEE Transactions on Neural Networks
Long-tailed distributions in grid complex network
Proceedings of the 2nd workshop on Grids meets autonomic computing
Discovering Piecewise Linear Models of Grid Workload
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Self-adaptive change detection in streaming data with non-stationary distribution
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
A model of pilot-job resource provisioning on production grids
Parallel Computing
Towards Non-Stationary Grid Models
Journal of Grid Computing
Hi-index | 0.00 |
The Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007) provides an understandable, nearly optimal summary of a dataset, albeit with quadratic computational complexity. This paper, motivated by Autonomic Computing, extends AP to the data streaming framework. Firstly a hierarchical strategy is used to reduce the complexity to O(N1+ε); the distortion loss incurred is analyzed in relation with the dimension of the data items. Secondly, a coupling with a change detection test is used to cope with non-stationary data distribution, and rebuild the model as needed. The presented approach StrAP is applied to the stream of jobs submitted to the EGEE Grid, providing an understandable description of the job flow and enabling the system administrator to spot online some sources of failures.