Hybrid data clustering based on dependency structure and gibbs sampling

Authors:
Shuang-Cheng Wang;Xiao-Lin Li;Hai-Yan Tang
Affiliations:
Department of Information Science, Shanghai Lixin University of Commerce, Shanghai, China;National Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;China Lixin Risk Management Research Institute, Shanghai Lixin University of Commerce, Shanghai, China
Venue:
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Year:
2006

Citing 3
Cited 0

Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
A new method to estimate null values in relational database systems based on automatic clustering techniques

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new method for data clustering is presented in this paper. It can cluster data set with both continuous and discrete data effectively. By using this method, the values of cluster variable are viewed as missing data. At first, the missing data are initialized randomly. All those data are revised through the iteration by combining Gibbs sampling with the dependency structure that is built according to prior knowledge or built as star-shaped structure alternatively. A penalty coefficient is introduced to extend the MDL scoring function and the optimal cluster number is determined by using the extended MDL scoring function and the statistical methods.