Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
On the algorithmic implementation of multiclass kernel-based vector machines
The Journal of Machine Learning Research
The link prediction problem for social networks
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
On the collective classification of email "speech acts"
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Entity Resolution with Markov Logic
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Relational Dependency Networks
The Journal of Machine Learning Research
Classification in Networked Data: A Toolkit and a Univariate Case Study
The Journal of Machine Learning Research
A unified approach for schema matching, coreference and canonicalization
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Discarte: a disjunctive internet cartographer
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Foundations and Trends in Databases
Joint unsupervised coreference resolution with Markov logic
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Max-Margin Weight Learning for Markov Logic Networks
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Cautious inference in collective classification
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Efficient personalized pagerank with accuracy assurance
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Transforming graph data for statistical relational learning
Journal of Artificial Intelligence Research
GRDB: a system for declarative and interactive analysis of noisy information networks
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Ontology-aware partitioning for knowledge graph identification
Proceedings of the 2013 workshop on Automated knowledge base construction
Modelling relational statistics with Bayes Nets
Machine Learning
Hi-index | 0.00 |
Data describing networks (communication networks, transaction networks, disease transmission networks, collaboration networks, etc.) is becoming increasingly ubiquitous. While this observational data is useful, it often only hints at the actual underlying social or technological structures which give rise to the interactions. For example, an email communication network provides useful insight but is not the same as the "real" social network among individuals. In this paper, we introduce the problem of graph identification, i.e., the discovery of the true graph structure underlying an observed network. We cast the problem as a probabilistic inference task, in which we must infer the nodes, edges, and node labels of a hidden graph, based on evidence provided by the observed network. This in turn corresponds to the problems of performing entity resolution, link prediction, and node labeling to infer the hidden graph. While each of these problems have been studied separately, they have never been considered together as a coherent task. We present a simple yet novel approach to address all three problems simultaneously. Our approach, called C3, consists of Coupled Collective Classifiers that are iteratively applied to propagate information among solutions to the problems. We empirically demonstrate that C3 is superior, in terms of both predictive accuracy and runtime, to state-of-the-art probabilistic approaches on three real-world problems.