A novel distance-based classifier built on pattern ranking

Authors:
Dipankar Bachar;Rosa Meo
Affiliations:
Università degli Studi di Torino, Italy;Università degli Studi di Torino, Italy
Venue:
Proceedings of the 2009 ACM symposium on Applied Computing
Year:
2009

Citing 11
Cited 1

Instance-Based Learning Algorithms

Machine Learning
Unifying instance-based and rule-based induction

Machine Learning
Extending naïve Bayes classifiers using long itemsets

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Theory of dependence values

ACM Transactions on Database Systems (TODS)
The Power of Decision Tables

ECML '95 Proceedings of the 8th European Conference on Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Improvements to Platt's SMO Algorithm for SVM Classifier Design

Neural Computation
Fast Discovery and the Generalization of Strong Jumping Emerging Patterns for Building Compact and Accurate Classifiers

IEEE Transactions on Knowledge and Data Engineering
Using Kullback-Leibler distance for text categorization

ECIR'03 Proceedings of the 25th European conference on IR research

LODE: A distance-based classifier built on ensembles of positive and negative observations

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instance-based classifiers that compute similarity between instances suffer from the presence of noise in the training set and from over-fitting. In this paper we propose a new type of distance-based classifier that instead of computing distances between instances computes the distance between each test instance and the classes. Both are represented by patterns in the space of the frequent itemsets. We ranked the itemsets by metrics of itemset significance. Then we considered only the top portion of the ranking that leads the classifier to reach the maximum accuracy. We have experimented on a large collection of datasets from UCI archive with different proximity measures and different metrics of itemsets ranking. We show that our method has many benefits: it reduces the number of distance computations, improves the classification accuracy of state-of-the art classifiers, like decision trees, SVM, k-nn, Naive Bayes, rule-based classifiers and association rule-based ones and outperforms the competitors especially on noise data.