Evaluation of different complexity measures for signal detection in genome sequences

Authors:
Mehdi Kargar;Aijun An
Affiliations:
York University, Toronto, Canada;York University, Toronto, Canada
Venue:
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Year:
2010

Citing 2
Cited 2

On Complexity Measures for Biological Sequences

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Diverging patterns: discovering significant frequency change dissimilarities in large databases

Proceedings of the 18th ACM conference on Information and knowledge management

Distinguishing Endogenous Retroviral LTRs from SINE Elements Using Features Extracted from Evolved Side Effect Machines

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Signal detection in genome sequences using complexity based features

Proceedings of the 12th International Workshop on Data Mining in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Analyzing large amounts of data is one of the most challenging problem in modern molecular biology. In this work, different complexity measures and methods are applied to identify the signals in the whole genome of the three prokaryotic organisms. In addition to previous complexity measures, new measures are introduced for representing Open Reading Frames (ORF). We apply classification algorithms to determine which complexity measures can lead to better predictive performance in discriminating genes from pseudo-genes in ORFs. Also, we investigate whether positions and lengths of windows in ORFs have significant impact on distinguishing between genes and pseudo-genes. Different classification algorithms are applied for classifying ORFs into genes and pseudo-genes.