Learning relational bayesian classifiers from RDF data

Authors:
Harris T. Lin;Neeraj Koul;Vasant Honavar
Affiliations:
Department of Computer Science, Iowa State University, Ames, IA;Department of Computer Science, Iowa State University, Ames, IA;Department of Computer Science, Iowa State University, Ames, IA
Venue:
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Year:
2011

Citing 14
Cited 3

Depth-first iterative-deepening: an optimal admissible tree search

Artificial Intelligence
Elements of information theory

Elements of information theory
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective
An introduction to variable and feature selection

The Journal of Machine Learning Research
Simple Estimators for Relational Bayesian Classifiers

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
RDF Aggregate Queries and Views

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Semantic Web: Concepts, Technologies and Applications (NASA Monographs in Systems and Software Engineering)

Semantic Web: Concepts, Technologies and Applications (NASA Monographs in Systems and Software Engineering)
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)

Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Learning Classifiers from Large Databases Using Statistical Queries

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Foundations of Semantic Web Technologies

Foundations of Semantic Web Technologies
TWC data-gov corpus: incrementally generating linked government data from data.gov

Proceedings of the 19th international conference on World wide web
Adding data mining support to SPARQL via statistical relational learning methods

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Relational kernel machines for learning from graph-structured RDF data

ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Algorithms and software for collaborative discovery from autonomous, semantically heterogeneous, distributed information sources

DS'05 Proceedings of the 8th international conference on Discovery Science

Factorizing YAGO: scalable machine learning for linked data

Proceedings of the 21st international conference on World Wide Web
Unsupervised generation of data mining features from linked open data

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Learning driver preferences of POIs using a semantic web knowledge system

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing availability of large RDF datasets offers an exciting opportunity to use such data to build predictive models using machine learning algorithms. However, the massive size and distributed nature of RDF data calls for approaches to learning from RDF data in a setting where the data can be accessed only through a query interface, e.g., the SPARQL endpoint of the RDF store. In applications where the data are subject to frequent updates, there is a need for algorithms that allow the predictive model to be incrementally updated in response to changes in the data. Furthermore, in some applications, the attributes that are relevant for specific prediction tasks are not known a priori and hence need to be discovered by the algorithm. We present an approach to learning Relational Bayesian Classifiers (RBCs) from RDF data that addresses such scenarios. Specifically, we show how to build RBCs from RDF data using statistical queries through the SPARQL endpoint of the RDF store. We compare the communication complexity of our algorithm with one that requires direct centralized access to the data and hence has to retrieve the entire RDF dataset from the remote location for processing. We establish the conditions under which the RBC models can be incrementally updated in response to addition or deletion of RDF data. We show how our approach can be extended to the setting where the attributes that are relevant for prediction are not known a priori, by selectively crawling the RDF data for attributes of interest. We provide open source implementation and evaluate the proposed approach on several large RDF datasets.