Improving learning in networked data by combining explicit and mined links

Authors:
Sofus A. Macskassy
Affiliations:
Fetch Technologies, El Segundo, CA
Venue:
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Year:
2007

Citing 15
Cited 10

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Probabilistic frame-based systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
Learning Probabilistic Relational Models

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Communities of Interest

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning relational probability trees

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Why collective inference improves relational classification

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised learning using randomized mincuts

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Label propagation through linear neighborhoods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Classification in Networked Data: A Toolkit and a Univariate Case Study

The Journal of Machine Learning Research
Probabilistic classification and clustering in relational data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Using ghost edges for classification in sparsely labeled networks

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Role of weak ties in link prediction of complex networks

Proceedings of the 1st ACM international workshop on Complex networks meet information & knowledge management
An Iterative Learning Algorithm for Within-Network Regression in the Transductive Setting

DS '09 Proceedings of the 12th International Conference on Discovery Science
Cautious Collective Classification

The Journal of Machine Learning Research
Multi-network fusion for collective inference

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
On the semantic annotation of places in location-based social networks

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Network regression with predictive clustering trees

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Batch Mode Active Learning for Networked Data

ACM Transactions on Intelligent Systems and Technology (TIST)
Exploring label dependency in active learning for phenotype mapping

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Transforming graph data for statistical relational learning

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is about using multiple types of information for classification of networked data in a semi-supervised setting: given a fully described network (nodes and edges) with known labels for some of the nodes, predict the labels of the remaining nodes. One method recently developed for doing such inference is a guilt-by-association model. This method has been independently developed in two different settings-relational learning and semi-supervised learning. In relational learning, the setting assumes that the networked data has explicit links such as hyperlinks between webpages or citations between research papers. The semi-supervised setting assumes a corpus of non-relational data and creates links based on similarity measures between the instances. Both use only the known labels in the network to predict the remaining labels but use very different information sources. The thesis of this paper is that if we combine these two types of links, the resulting network will carry more information than either type of link by itself. We test this thesis on six benchmark data sets, using a within-network learning algorithm, where we show that we gain significant improvements in predictive performance by combining the links. We describe a principled way of combining mUltiple types of edges with different edge-weights and semantics using an objective graph measure called node-based assortativity. We investigate the use of this measure to combine text-mined links with explicit links and show that using our approach significantly improves performance of our classifier over naively combining these two types of links.