Maximum patterns in datasets

Authors:
T. O. Bonates;Peter L. Hammer;A. Kogan
Affiliations:
Rutgers Center for Operations Research-RUTCOR, Rutgers University, 640, Bartholomew Road, Piscataway, NJ 08854, USA;Rutgers Center for Operations Research-RUTCOR, Rutgers University, 640, Bartholomew Road, Piscataway, NJ 08854, USA;Rutgers Center for Operations Research-RUTCOR, Rutgers University, 640, Bartholomew Road, Piscataway, NJ 08854, USA and Accounting and Information Systems, Rutgers Business School, Rutgers Univers ...
Venue:
Discrete Applied Mathematics
Year:
2008

Citing 21
Cited 3

Cause-effect relationships and partially defined Boolean functions

Annals of Operations Research
C4.5: programs for machine learning

C4.5: programs for machine learning
Overfitting and undercomputing in machine learning

ACM Computing Surveys (CSUR)
Boosting a weak learning algorithm by majority

Information and Computation
Bagging predictors

Machine Learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
A threshold of ln n for approximating set cover

Journal of the ACM (JACM)
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
An Implementation of Logical Analysis of Data

IEEE Transactions on Knowledge and Data Engineering
A New Version of Rough Set Exploration System

TSCTC '02 Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing
Learning monotone dnf from a teacher that almost does not answer membership queries

The Journal of Machine Learning Research
Learning DNF in time 2õ(n1/3)

Journal of Computer and System Sciences - STOC 2001
On the Complexity of Finding Emerging Patterns

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Pareto-optimal patterns in logical analysis of data

Discrete Applied Mathematics - Discrete mathematics & data mining (DM & DM)
Spanned patterns for the logical analysis of data

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Accelerated algorithm for pattern detection in logical analysis of data

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Subgroup discovery techniques and applications

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Logical analysis of data --- the vision of Peter L. Hammer

Annals of Mathematics and Artificial Intelligence
Comparisons of classification methods in the original and pattern spaces

Expert Systems with Applications: An International Journal
Compact MILP models for optimal and Pareto-optimal LAD patterns

Discrete Applied Mathematics

Quantified Score

Hi-index	0.05

Visualization

Abstract

Given a binary dataset of positive and negative observations, a positive (negative) pattern is a subcube having a nonempty intersection with the positive (negative) subset of the dataset, and an empty intersection with the negative (positive) subset of the dataset. Patterns are the key building blocks in Logical Analysis of Data (LAD), and are an essential tool in identifying the positive or negative nature of ''new'' observations covered by them. We develop exact and heuristic algorithms for constructing a pattern of maximum coverage which includes a given point. It is shown that the heuristically constructed patterns can achieve 81-98% of the maximum possible coverage, while requiring only a fraction of the computing time of the exact algorithm. Maximum patterns are shown to be useful for constructing highly accurate LAD classification models. In comparisons with the commonly used machine learning algorithms implemented in the publicly available Weka software package, the implementation of LAD using maximum patterns is shown to be a highly competitive classification method.