Molecular feature mining in HIV data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining Molecular Fragments: Finding Relevant Substructures of Molecules
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Hi-index | 0.00 |
In this paper, we study the problem of mining high confidence fragment-based classification rules from the imbalanced HIV data whose class distribution is extremely skewed. We propose an efficient approach to mining frequent fragments in different classes of compounds that can provide best hints of the characteristic of each class and can be used to build associative classification rules. We adopt the pattern-growth paradigm and define an efficient fragment enumeration scheme. Moreover, we introduce an improved instance-centric rule-generation strategy to mine the high-confidence fragment-based classification rules, which are very insightful and useful in differentiating one class from other classes. Experiments show that our algorithm can discover more interesting rules than the previous method and can facilitate the detection of new compounds with desired anti-HIV activity.