Data mining: concepts and techniques
Data mining: concepts and techniques
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Rapid identification of repeated patterns in strings, trees and arrays
STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
Encoding of primary structures of biological macromolecules within a data mining perspective
Journal of Computer Science and Technology - Special issue on bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Comparing graph-based representations of protein for mining purposes
Proceedings of the KDD-09 Workshop on Statistical and Relational Learning in Bioinformatics
Hi-index | 0.00 |
The classification of biological sequences is one of the significant challenges in bioinformatics as well for protein as for nucleic sequences. The presence of these data in huge masses, their ambiguity and especially the high costs of the in vitro analysis in terms of time and money, make the use of data mining rather a necessity than a rational choice. However, the data mining techniques, which often process data under the relational format, are confronted with the inappropriate format of the biological sequences. Hence, an inevitable step of pre-processing must be established. This work presents the biological sequences encoding as a preparation step before their classification. We present three existing encoding methods based on the motifs extraction. We also propose to improve one of these methods and we carry out a comparative study which takes into account, of course, the effect of each method on the classification accuracy but also the number of generated attributes and the CPU time.