A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A new framework for itemset generation
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Data mining: concepts and techniques
Data mining: concepts and techniques
Constraint-Based Rule Mining in Large, Dense Databases
Data Mining and Knowledge Discovery
A Statistical Theory for Quantitative Association Rules
Journal of Intelligent Information Systems
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Rule Evaluation Measures: A Unifying View
ILP '99 Proceedings of the 9th International Workshop on Inductive Logic Programming
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining association rules on significant rare data using relative support
Journal of Systems and Software
Screening and interpreting multi-item associations based on log-linear modeling
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
XRules: an effective structural classifier for XML data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications
IEEE Transactions on Knowledge and Data Engineering
Data Mining and Knowledge Discovery
Reducing the Frequent Pattern Set
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Extracting Variable Knowledge from Multiversioned XML Documents
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Mining Substructures in Protein Data
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Discovering Significant Patterns
Machine Learning
Knowledge Analysis with Tree Patterns
HICSS '08 Proceedings of the Proceedings of the 41st Annual Hawaii International Conference on System Sciences
Data Analysis in the 21st Century
Statistical Analysis and Data Mining
Mining significant graph patterns by leap search
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Tree model guided candidate generation for mining frequent subtrees from XML documents
ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining Unordered Distance-Constrained Embedded Subtrees
DS '08 Proceedings of the 11th International Conference on Discovery Science
Mining Mutually Dependent Ordered Subtrees in Tree Databases
New Frontiers in Applied Data Mining
Interestingness of Association Rules Using Symmetrical Tau and Logistic Regression
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Mining frequent trees with node-inclusion constraints
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Frequent Subtree Mining - An Overview
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Hi-index | 0.00 |
Recently mining frequent substructures from XML data has gained a considerable amount of interest. Different methods have been proposed and examined for mining frequent patterns from XML documents efficiently and effectively. While many frequent XML patterns generated are useful and interesting, it is common that a large portion of them is not considered as interesting or significant for the application at hand. In this paper, we present a systematic approach to ascertain whether the discovered XML patterns are significant and not just coincidental associations, and provide a precise statistical approach to support this framework. The proposed strategy combines data mining and statistical measurement techniques to discard the non significant patterns. In this paper we considered the "Prions" database that describes the protein instances stored for Human Prions Protein. The proposed unified framework is applied on this dataset to demonstrate its effectiveness in assessing interestingness of discovered XML patterns by statistical means. When the dataset is used for classification/prediction purposes, the proposed approach will discard non significant XML patterns, without the cost of a reduction in the accuracy of the pattern set as a whole.