SIAM Journal on Optimization
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Why collective inference improves relational classification
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning structured prediction models: a large margin approach
ICML '05 Proceedings of the 22nd international conference on Machine learning
The dynamics of viral marketing
ACM Transactions on the Web (TWEB)
Graph clustering based on structural/attribute similarities
Proceedings of the VLDB Endowment
Discriminative probabilistic models for relational data
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Constructing free-energy approximations and generalized belief propagation algorithms
IEEE Transactions on Information Theory
Transforming graph data for statistical relational learning
Journal of Artificial Intelligence Research
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Collective classification in relational data has become an important and active research topic in the last decade. It exploits the dependencies of instances in a network to improve predictions. Related applications include hyperlinked document classification, social network analysis and collaboration network analysis. Most of the traditional collective classification models mainly study the scenario that there exists a large amount of labeled examples (labeled nodes). However, in many real-world applications, labeled data are extremely difficult to obtain. For example, in network intrusion detection, there may be only a limited number of identified intrusions whereas there are a huge set of unlabeled nodes. In this situation, most of the data have no connection to labeled nodes; hence, no supervision knowledge can be obtained from the local connections. In this paper, we propose to explore various latent linkages among the nodes and judiciously integrate the linkages to generate a latent graph. This is achieved by finding a graph that maximizes the linkages among the training data with the same label, and maximizes the separation among the data with different labels. The objective is further cast into an optimization problem and is solved with quadratic programming. Finally, we apply label propagation on the latent graph to make prediction. Experiments show that the proposed model LNP (Latent Network Propagation) can improve the learning accuracy significantly. For instance, when there are only 10% of labeled examples, the accuracies of all the comparison models are less than 63%, while that of the proposed model is 74%.