On Complexity Measures for Biological Sequences
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Diverging patterns: discovering significant frequency change dissimilarities in large databases
Proceedings of the 18th ACM conference on Information and knowledge management
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Signal detection in genome sequences using complexity based features
Proceedings of the 12th International Workshop on Data Mining in Bioinformatics
Hi-index | 0.00 |
Analyzing large amounts of data is one of the most challenging problem in modern molecular biology. In this work, different complexity measures and methods are applied to identify the signals in the whole genome of the three prokaryotic organisms. In addition to previous complexity measures, new measures are introduced for representing Open Reading Frames (ORF). We apply classification algorithms to determine which complexity measures can lead to better predictive performance in discriminating genes from pseudo-genes in ORFs. Also, we investigate whether positions and lengths of windows in ORFs have significant impact on distinguishing between genes and pseudo-genes. Different classification algorithms are applied for classifying ORFs into genes and pseudo-genes.