Filtering Multi-Instance Problems to Reduce Dimensionality in Relational Learning

Authors:
érick Alphonse;Stan Matwin
Affiliations:
LRI–UMR 8623 CNRS, Bât 490, Universit Paris-Sud, 91405 Orsay Cedex, France. alphonse@lri.fr;SITE, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada. stan@site.uottawa.ca
Venue:
Journal of Intelligent Information Systems
Year:
2004

Citing 16
Cited 8

Subsumption and implication

Information Processing Letters
KBG: a generator of knowledge bases

EWSL-91 Proceedings of the European working session on learning on Machine learning
Bayesian inductive logic programming

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Learning Boolean concepts in the presence of many irrelevant features

Artificial Intelligence
Machine Discovery of Protein Motifs

Machine Learning - Special issue on applications in molecular biology
Solving the multiple instance problem with axis-parallel rectangles

Artificial Intelligence
Logical settings for concept-learning

Artificial Intelligence
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
A Note on Learning from Multiple-Instance Examples

Machine Learning - Special issue on the ninth annual conference on computational theory (COLT '96)
Knowledge-Based Learning in Exploratory Science: Learning Rules to Predict Rodent Carcinogenicity

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective
Inductive Logic Programming: Techniques and Applications

Inductive Logic Programming: Techniques and Applications
Learning Logical Definitions from Relations

Machine Learning
A Practical Approach to Feature Selection

ML '92 Proceedings of the Ninth International Workshop on Machine Learning
Learning Structurally Indeterminate Clauses

ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Tractable induction and classification in first order logic via stochastic matching

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2

Adapting RBF Neural Networks to Multi-Instance Learning

Neural Processing Letters
Solving multi-instance problems with classifier ensemble based on constructive clustering

Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Multi-instance clustering with applications to multi-instance prediction

Applied Intelligence
G3P-MI: A genetic programming algorithm for multiple instance learning

Information Sciences: an International Journal
Informative variables selection for multi-relational supervised learning

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Multi-instance multi-label learning

Artificial Intelligence
A study of applying dimensionality reduction to restrict the size of a hypothesis space

ILP'05 Proceedings of the 15th international conference on Inductive Logic Programming
Reducing the size of databases for multirelational classification: a subgraph-based approach

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Attribute-value based representations, standard in today's data mining systems, have a limited expressiveness. Inductive Logic Programming provides an interesting alternative, particularly for learning from structured examples whose parts, each with its own attributes, are related to each other by means of first-order predicates. Several subsets of first-order logic (FOL) with different expressive power have been proposed in Inductive Logic Programming (ILP). The challenge lies in the fact that the more expressive the subset of FOL the learner works with, the more critical the dimensionality of the learning task. The Datalog language is expressive enough to represent realistic learning problems when data is given directly in a relational database, making it a suitable tool for data mining. Consequently, it is important to elaborate techniques that will dynamically decrease the dimensionality of learning tasks expressed in Datalog, just as Feature Subset Selection (FSS) techniques do it in attribute-value learning. The idea of re-using these techniques in ILP runs immediately into a problem as ILP examples have variable size and do not share the same set of literals. We propose here the first paradigm that brings Feature Subset Selection to the level of ILP, in languages at least as expressive as Datalog. The main idea is to first perform a change of representation, which approximates the original relational problem by a multi-instance problem. The representation obtained as the result is suitable for FSS techniques which we adapted from attribute-value learning by taking into account some of the characteristics of the data due to the change of representation. We present the simple FSS proposed for the task, the requisite change of representation, and the entire method combining those two algorithms. The method acts as a filter, preprocessing the relational data, prior to the model building, which outputs relational examples with empirically relevant literals. We discuss experiments in which the method was successfully applied to two real-world domains.