Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Journal of Classification
Using Multivariate Statistics (5th Edition)
Using Multivariate Statistics (5th Edition)
A fuzzy k-modes algorithm for clustering categorical data
IEEE Transactions on Fuzzy Systems
A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Environmental Modelling & Software
Large margin classifiers and Random Forests for integrated biological prediction
International Journal of Bioinformatics Research and Applications
Hi-index | 0.01 |
In this paper, we develop a semi-supervised regression algorithm to analyze data sets which contain both categorical and numerical attributes. This algorithm partitions the data sets into several clusters and at the same time fits a multivariate regression model to each cluster. This framework allows one to incorporate both multivariate regression models for numerical variables (supervised learning methods) and k-mode clustering algorithms for categorical variables (unsupervised learning methods). The estimates of regression models and k-mode parameters can be obtained simultaneously by minimizing a function which is the weighted sum of the least-square errors in the multivariate regression models and the dissimilarity measures among the categorical variables. Both synthetic and real data sets are presented to demonstrate the effectiveness of the proposed method.