Theoretical Computer Science
C4.5: programs for machine learning
C4.5: programs for machine learning
On learning multiple concepts in parallel
COLT '93 Proceedings of the sixth annual conference on Computational learning theory
Experience with a learning personal assistant
Communications of the ACM
Inclusion problems in parallel learning and games (extended abstract)
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
A machine discovery from amino acid sequences by decision trees over regular patterns
Selected papers of international conference on Fifth generation computer systems 92
Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
The nature of statistical learning theory
The nature of statistical learning theory
Machine learning, neural and statistical classification
Machine learning, neural and statistical classification
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Robust learning aided by context
Journal of Computer and System Sciences - Eleventh annual conference on computational learning theory&slash;Twelfth Annual IEEE conference on computational complexity
Bioinformatics: the machine learning approach
Bioinformatics: the machine learning approach
Machine Learning
The Divide-and-Conquer Manifesto
ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
Hi-index | 0.00 |
Genomic strings are not of fixed length, but provide one-dimensional spatial data that do not divide for conquering by machine learning into manageable fixed size chunks obeying Dietterich's independent and identically distributed assumption. We nonetheless need to divide genomic strings for conquering by machine learning -- in this case for genomic prediction.Orthologs are genomic strings derived from a common ancestor and having the same biological function. Ortholog detection is biologically interesting since it informs us about protein divergence through evolution, and, in the present context, also has important agricultural applications. In the present paper is indicated means to obtain an associated (fixed size) attribute vector for genomic string data and for dividing and conquering the machine learning problem of ortholog detection herein seen as an analogy problem. The attributes are based on both the typical string similarity measures of bioinformatics and on a large number of differential metrics, many new to bioinformatics. Many of the differential metrics are based on evolutionary considerations, both theoretical and empirically observed, in some cases observed by the authors.C5.0 with AdaBoosting activated was employed and the preliminary results reported herein re complete cDNA strings are very encouraging for eventually and usefully employing the techniques described for ortholog detection on the more readily available EST (incomplete) genomic data.