Principles of Neurocomputing for Science and Engineering
Principles of Neurocomputing for Science and Engineering
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Position dependencies in transcription factor binding sites
Bioinformatics
Identifying Functional Binding Motifs of Tumor Protein p53 Using Support Vector Machines
ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
DNA Motif Representation with Nucleotide Dependency
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Introduction to Information Retrieval
Introduction to Information Retrieval
Data Mining on Imbalanced Data Sets
ICACTE '08 Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
SVMs modeling for highly imbalanced classification
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
MISCORE: mismatch-based matrix similarity scores for DNA motif detection
ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Motif discoveries in unaligned molecular sequences using self-organizing neural networks
IEEE Transactions on Neural Networks
Geometric visualization of TF binding sites in context
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Hi-index | 0.00 |
Motifs refer to a collection of transcription factor binding sites (TFBSs) which are located at promoters of genes. Discovery of motifs is critical to further understanding the mechanism of gene regulation. Computational approaches addressing this challenging problem have demonstrated good potential. However, the existing motif search approaches have some limits to deal with remarkably under-presentation of binding sites in biological datasets, resulting in considerably high false-positive rate in prediction. We resolve the task as an imbalanced biological data classification problem and our technical contributions in this paper include the following aspects: (i) propose a novel similarity metrics for comparing DNA subsequences based on overlap range of nucleotides in DNA sequences; and (ii) introduce a new sampling method which combines both over- and under-sampling techniques. The effectiveness of our proposed similarity metrics and sampling approach is demonstrated by two benchmark datasets and three classification techniques --- Neural Networks (NN), Support Vector Machine (SVM), and Learning Vector Quantization (LVQ1). Empirical studies show that the classifier LVQ1 integrated with the proposed similarity metrics performs slightly better other approaches on the test datasets.