Learning classifiers from distributed, ontology-extended data sources

Authors:
Doina Caragea;Jun Zhang;Jyotishman Pathak;Vasant Honavar
Affiliations:
AI Research Lab, Department of Computer Science, Iowa State University, Ames, IA;AI Research Lab, Department of Computer Science, Iowa State University, Ames, IA;AI Research Lab, Department of Computer Science, Iowa State University, Ames, IA;AI Research Lab, Department of Computer Science, Iowa State University, Ames, IA
Venue:
DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Year:
2006

Citing 8
Cited 1

Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Efficient noise-tolerant learning from statistical queries

Journal of the ACM (JACM)
Machine Learning

Machine Learning
Advances in Distributed and Parallel Knowledge Discovery

Advances in Distributed and Parallel Knowledge Discovery
AVT-NBL: An Algorithm for Learning Compact and Accurate Naïve Bayes Classifiers from Attribute Value Taxonomies and Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Semantic-integration research in the database community

AI Magazine - Special issue on semantic integration
A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees

International Journal of Hybrid Intelligent Systems
Learning ontology-aware classifiers

DS'05 Proceedings of the 8th international conference on Discovery Science

Survey of modular ontology techniques and their applications in the biomedical domain

Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is an urgent need for sound approaches to integrative and collaborative analysis of large, autonomous (and hence, inevitably semantically heterogeneous) data sources in several increasingly data-rich application domains. In this paper, we precisely formulate and solve the problem of learning classifiers from such data sources, in a setting where each data source has a hierarchical ontology associated with it and semantic correspondences between data source ontologies and a user ontology are supplied. The proposed approach yields algorithms for learning a broad class of classifiers (including Bayesian networks, decision trees, etc.) from semantically heterogeneous distributed data with strong performance guarantees relative to their centralized counterparts. We illustrate the application of the proposed approach in the case of learning Naive Bayes classifiers from distributed, ontology-extended data sources.