Proportional fault-tolerant data mining with applications to bioinformatics

Authors:
Guanling Lee;Sheng-Lung Peng;Yuh-Tzu Lin
Affiliations:
Department of Computer Science and Information Engineering, National Dong Hwa University, Taiwan, Republic of China;Department of Computer Science and Information Engineering, National Dong Hwa University, Taiwan, Republic of China;Department of Computer Science and Information Engineering, National Dong Hwa University, Taiwan, Republic of China
Venue:
Information Systems Frontiers
Year:
2009

Citing 20
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient discovery of error-tolerant frequent itemsets in high dimensions

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A new two-phase sampling based algorithm for discovering association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient mining of both positive and negative association rules

ACM Transactions on Information Systems (TOIS)
Mining positive and negative association rules: an approach for confined rules

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Bioinformatics Technologies

Bioinformatics Technologies
Mining Frequent Pattern Using Item-Transformation Method

Proceedings of the Fourth Annual ACIS International Conference on Computer and Information Science
Predicting Protein-Protein Interactions by Association Mining

Information Systems Frontiers
A data mining approach to database compression

Information Systems Frontiers
Matrix apriori: speeding up the search for frequent patterns

DBA'06 Proceedings of the 24th IASTED international conference on Database and applications
Mining temporal indirect associations

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A sampling-based method for mining frequent patterns from databases

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

The mining of frequent patterns in databases has been studied for several years, but few reports have discussed for fault-tolerant (FT) pattern mining. FT data mining is more suitable for extracting interesting information from real-world data that may be polluted by noise. In particular, the increasing amount of today's biological databases requires such a data mining technique to mine important data, e.g., motifs. In this paper, we propose the concept of proportional FT mining of frequent patterns. The number of tolerable faults in a proportional FT pattern is proportional to the length of the pattern. Two algorithms are designed for solving this problem. The first algorithm, named FT-BottomUp, applies an FT-Apriori heuristic and finds all FT patterns with any number of faults. The second algorithm, FT-LevelWise, divides all FT patterns into several groups according to the number of tolerable faults, and mines the content patterns of each group in turn. By applying our algorithm on real data, two reported epitopes of spike proteins of SARS-CoV can be found in our resulting itemset and the proportional FT data mining is better than the fixed FT data mining for this application.