New voting strategies designed for the classification of nucleic sequences

Authors:
Mourad Elloumi;Mondher Maddouri
Affiliations:
Department of Computer Science, Faculty of Economic Sciences and Management of Tunis, El Manar, 2092, Tunis, Tunisia;Computer Science Department, National Institute of Applied Sciences and Technology, El Manar, 2092, Tunis, Tunisia
Venue:
Knowledge and Information Systems
Year:
2005

Citing 8
Cited 0

Introduction to artificial neural systems

Introduction to artificial neural systems
Symbolic knowledge and neural networks: insertion, refinement and extraction

Symbolic knowledge and neural networks: insertion, refinement and extraction
Comparison of strings belonging to the same family

Information Sciences—Informatics and Computer Science: An International Journal
Best-Case Results for Nearest-Neighbor Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning Approaches to Gene Recognition

IEEE Expert: Intelligent Systems and Their Applications
Rapid identification of repeated patterns in strings, trees and arrays

STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
New techniques for extracting features from protein sequences

IBM Systems Journal - Deep computing for the life sciences
Connectionist theory refinement: genetically searching the space of network topologies

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biological macromolecules, i.e. DNA, RNA and proteins, are coded by strings, called primary structures. During the last decades, the number and the complexity of primary structures are growing exponentially. Analyzing this huge volume of data to extract pertinent knowledge is a challenging task. Data mining approaches can be helpful to reach this goal. In this paper, we present a new data mining approach, called Disclass, based on vote strategies to do classification of primary structures: Let f1,f2,...,fn be families that represent, respectively, n samples of n sets S1,S2,...,Sn of primary structures. Let us consider now a new primary structure w that is assumed to belong to one of the n sets S1,S2,...,Sn. By using our data mining approach Disclass, the decision to assign the new primary structure w to one of the sets S1,S2,...,Sn is taken as follows: (i) During the first step, for each family fi, 1≤i≤n, we construct the ambiguously discriminant and minimal substrings (ADMS) associated with this family. Because the family fi, 1≤i≤n, is a sample of the set Si, the obtained ADMS are considered also to be associated with the whole set Si. During the classification process, the ADMS associated with the set Si, that are approximate substrings of the new primary structure w, will vote with weighted voices for the set Si. (ii) During the second step, we compute according to a vote strategy, the voice weights of the different ADMS, constructed during the first step. (iii) Finally, during the last step, the set that has the maximum weight of voices is the set to which we assign the new primary structure w.