Heterogeneous source consensus learning via decision propagation and negotiation

Authors:
Jing Gao;Wei Fan;Yizhou Sun;Jiawei Han
Affiliations:
University of Illinois, Urbana-Champaign, Urbana, IL, USA;IBM TJ Watson Research Center, Hawthorne, NY, USA;University of Illinois, Urbana-Champaign, Urbana, IL, USA;University of Illinois, Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2009

Citing 18
Cited 7

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Integrating Microarray Data by Consensus Clustering

ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Solving cluster ensemble problems by bipartite graph partitioning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Clustering Ensembles: Models of Consensus and Weak Partitions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining Multiple Clusterings by Soft Correspondence

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Discovering Classification from Data of Multiple Sources

Data Mining and Knowledge Discovery
Clustering aggregation

ACM Transactions on Knowledge Discovery from Data (TKDD)
Relational Dependency Networks

The Journal of Machine Learning Research
Classification in Networked Data: A Toolkit and a Univariate Case Study

The Journal of Machine Learning Research
Combining content and link for classification using matrix factorization

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Ensembles of relational classifiers

Knowledge and Information Systems
Knowledge transformation from word space to document space

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Knowledge transfer via multiple model local structure mapping

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Transfer learning from multiple source domains via consensus regularization

Proceedings of the 17th ACM conference on Information and knowledge management
Weighted cluster ensembles: Methods and analysis

ACM Transactions on Knowledge Discovery from Data (TKDD)

VideoMule: a consensus learning approach to multi-label classification from noisy user-generated videos

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Fusing heterogeneous modalities for video and image re-ranking

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Relation strength-aware clustering of heterogeneous information networks with incomplete attributes

Proceedings of the VLDB Endowment
Latent Community Topic Analysis: Integration of Community Discovery with Topic Modeling

ACM Transactions on Intelligent Systems and Technology (TIST)
Triplex transfer learning: exploiting both shared and distinct concepts for text classification

Proceedings of the sixth ACM international conference on Web search and data mining
Quality of information-based source assessment and selection

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, enormous amounts of data are continuously generated not only in massive scale, but also from different, sometimes conflicting, views. Therefore, it is important to consolidate different concepts for intelligent decision making. For example, to predict the research areas of some people, the best results are usually achieved by combining and consolidating predictions obtained from the publication network, co-authorship network and the textual content of their publications. Multiple supervised and unsupervised hypotheses can be drawn from these information sources, and negotiating their differences and consolidating decisions usually yields a much more accurate model due to the diversity and heterogeneity of these models. In this paper, we address the problem of "consensus learning" among competing hypotheses, which either rely on outside knowledge (supervised learning) or internal structure (unsupervised clustering). We argue that consensus learning is an NP-hard problem and thus propose to solve it by an efficient heuristic method. We construct a belief graph to first propagate predictions from supervised models to the unsupervised, and then negotiate and reach consensus among them. Their final decision is further consolidated by calculating each model's weight based on its degree of consistency with other models. Experiments are conducted on 20 Newsgroups data, Cora research papers, DBLP author-conference network, and Yahoo! Movies datasets, and the results show that the proposed method improves the classification accuracy and the clustering quality measure (NMI) over the best base model by up to 10%. Furthermore, it runs in time proportional to the number of instances, which is very efficient for large scale data sets.