Mining fault-tolerant item sets using subset size occurrence distributions

Authors:
Christian Borgelt;Tobias Kötter
Affiliations:
European Centre for Soft Computing, Mieres, Asturias, Spain;Dept. of Computer Science, University of Konstanz, Konstanz, Germany
Venue:
IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Year:
2011

Citing 13
Cited 0

KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Efficient discovery of error-tolerant frequent itemsets in high dimensions

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Approximation of Frequency Queris by Means of Free-Sets

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Dense itemsets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Geometric and combinatorial tiles in 0-1 data

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Efficient Mining of Frequent Patterns from Uncertain Data

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Frequent pattern mining with uncertain data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Supporting bi-cluster interpretation in 0/1 data by means of local patterns

Intelligent Data Analysis - Selected papers from IDA2005, Madrid, Spain
Mining frequent itemsets from uncertain data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Item set mining based on cover similarity

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Mining a new fault-tolerant pattern type as an alternative to formal concept discovery

ICCS'06 Proceedings of the 14th international conference on Conceptual Structures: inspiration and Application
Efficient pattern mining of uncertain data with sampling

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining fault-tolerant (or approximate or fuzzy) item sets means to allow for errors in the underlying transaction data in the sense that actually present items may not be recorded due to noise or measurement errors. In order to cope with such missing items, transactions that do not contain all items of a given set are still allowed to support it. However, either the number of missing items must be limited, or the transaction's contribution to the item set's support is reduced in proportion to the number of missing items, or both. In this paper we present an algorithm that efficiently computes the subset size occurrence distribution of item sets, evaluates this distribution to find fault-tolerant item sets, and exploits intermediate data to remove pseudo (or spurious) item sets. We demonstrate the usefulness of our algorithm by applying it to a concept detection task on the 2008/2009 Wikipedia Selection for schools.