GAMoN: Discovering M-of-N{¬,∨} hypotheses for text classification by a lattice-based Genetic Algorithm

Authors:
Veronica L. Policicchio;Adriana Pietramala;Pasquale Rullo
Affiliations:
Dept. of Mathematics, University of Calabria, Italy;Dept. of Mathematics, University of Calabria, Italy;Dept. of Mathematics, University of Calabria, Italy
Venue:
Artificial Intelligence
Year:
2012

Citing 45
Cited 0

Genetic algorithms with sharing for multimodal function optimization

Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application
Computational limitations on learning from examples

Journal of the ACM (JACM)
Overfitting Avoidance as Bias

Machine Learning
Competition-Based Induction of Decision Models from Examples

Machine Learning - Special issue on genetic algorithms
Extracting Refined Rules from Knowledge-Based Neural Networks

Machine Learning
OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Constructing X-of-N Attributes for Decision Tree Learning

Machine Learning
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Foundations of Inductive Logic Programming

Foundations of Inductive Logic Programming
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Data Mining and Knowledge Discovery with Evolutionary Algorithms

Data Mining and Knowledge Discovery with Evolutionary Algorithms
Learning Logical Definitions from Relations

Machine Learning
SIA: A Supervised Inductive Algorithm with Genetic Search for Learning Attributes based Concepts

ECML '93 Proceedings of the European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Optimal Mutation Rates in Genetic Search

Proceedings of the 5th International Conference on Genetic Algorithms
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Parallel Genetic Algorithm for Concept Learning

Proceedings of the 6th International Conference on Genetic Algorithms
Centroid-Based Document Classification: Analysis and Experimental Results

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Constructing X-of-n Attributes With A Genetic Algorithm

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
A hybrid decision tree/genetic algorithm method for data mining

Information Sciences: an International Journal - Special issue: Soft computing data mining
Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Associative text categorization exploiting negated words

Proceedings of the 2006 ACM symposium on Applied computing
Advances in Evolutionary Algorithms: Theory, Design and Practice (Studies in Computational Intelligence)

Advances in Evolutionary Algorithms: Theory, Design and Practice (Studies in Computational Intelligence)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
The effect of threshold values on association rule based classification accuracy

Data & Knowledge Engineering
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Automated alphabet reduction method with evolutionary algorithms for protein structure prediction

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Classifier fitness based on accuracy

Evolutionary Computation
A Genetic Algorithm for Text Classification Rule Induction

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
KEEL: a software tool to assess evolutionary algorithms for data mining problems

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Olex: Effective Rule Learning for Text Categorization

IEEE Transactions on Knowledge and Data Engineering
Generating production rules from decision trees

IJCAI'87 Proceedings of the 10th international joint conference on Artificial intelligence - Volume 1
Performance and efficiency of memetic pittsburgh learning classifier systems

Evolutionary Computation
A method for handling numerical attributes in GA-based inductive concept learners

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Improving the performance of a pittsburgh learning classifier system using a default rule

IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
A genetic algorithms approach to ILP

ILP'02 Proceedings of the 12th international conference on Inductive logic programming
Speeding up the evaluation of evolutionary learning systems using GPGPUs

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study

IEEE Transactions on Evolutionary Computation
Classification inductive rule learning with negated features

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Extracting M-of-N rules from trained neural networks

IEEE Transactions on Neural Networks
A GA-based Learning Algorithm for Inducing M-of-N-like Text Classifiers

ICMLA '11 Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

While there has been a long history of rule-based text classifiers, to the best of our knowledge no M-of-N-based approach for text categorization has so far been proposed. In this paper we argue that M-of-N hypotheses are particularly suitable to model the text classification task because of the so-called ''family resemblance'' metaphor: ''the members (i.e., documents) of a family (i.e., category) share some small number of features, yet there is no common feature among all of them. Nevertheless, they resemble each other''. Starting from this conjecture, we provide a sound extension of the M-of-N approach with negation and disjunction, called M-of-N^{^@?^,^@?^}, which enables to best fit the true structure of the data. Based on a thorough theoretical study, we show that the M-of-N^{^@?^,^@?^} hypothesis space has two partial orders that form complete lattices. GAMoN is the task-specific Genetic Algorithm (GA) which, by exploiting the lattice-based structure of the hypothesis space, efficiently induces accurate M-of-N^{^@?^,^@?^} hypotheses. Benchmarking was performed over 13 real-world text data sets, by using four rule induction algorithms: two GAs, namely, BioHEL and OlexGA, and two non-evolutionary algorithms, namely, C4.5 and Ripper. Further, we included in our study linear SVM, as it is reported to be among the best methods for text categorization. Experimental results demonstrate that GAMoN delivers state-of-the-art classification performance, providing a good balance between accuracy and model complexity. Further, they show that GAMoN can scale up to large and realistic real-world domains better than both C4.5 and Ripper.