Discovering Classification from Data of Multiple Sources

Authors:
Charles X. Ling;Qiang Yang
Affiliations:
Department of Computer Science, University of Western Ontario, London, Canada N6A 5B7;Department of Computer Science, Hong Kong UST, Kowloon, Hong Kong
Venue:
Data Mining and Knowledge Discovery
Year:
2006

Citing 12
Cited 3

The formation and use of abstract concepts in design

Concept formation knowledge and experience in unsupervised learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Category learning through multimodality sensing

Neural Computation
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Synthesizing High-Frequency Rules from Different Data Sources

IEEE Transactions on Knowledge and Data Engineering
Exploiting Context When Learning to Classify

ECML '93 Proceedings of the European Conference on Machine Learning
Combining clustering and co-training to enhance text classification using unlabelled data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics

Mining Multiple Data Sources: Local Pattern Analysis

Data Mining and Knowledge Discovery
Heterogeneous source consensus learning via decision propagation and negotiation

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Application of a generalization of russo's formula to learning from multiple random oracles

Combinatorics, Probability and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many large e-commerce organizations, multiple data sources are often used to describe the same customers, thus it is important to consolidate data of multiple sources for intelligent business decision making. In this paper, we propose a novel method that predicts the classification of data from multiple sources without class labels in each source. We test our method on artificial and real-world datasets, and show that it can classify the data accurately. From the machine learning perspective, our method removes the fundamental assumption of providing class labels in supervised learning, and bridges the gap between supervised and unsupervised learning.