An efficient pattern matching algorithm for comparative Genome sequence analysis

Authors:
Muneer Ahmad;Hassan Mathkour
Affiliations:
Department of Computer Science, College of Computer & Information Sciences, King Saud University, Saudi Arabia;Department of Computer Science, College of Computer & Information Sciences, King Saud University, Saudi Arabia
Venue:
ACC'08 Proceedings of the WSEAS International Conference on Applied Computing Conference
Year:
2008

Citing 8
Cited 1

Modern Information Retrieval

Modern Information Retrieval
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A Faster Algorithm for Approximate String Matching

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
A Fast Algorithm on Average for All-Against-All Sequence Matching

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Multi-resolution disambiguation of term occurrences

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
The effects of word order and segmentation on translation retrieval performance

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Extended gloss overlaps as a measure of semantic relatedness

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

An integrated statistical comparative analysis between variant genetic datasets of Mus musculus

International Journal of Computational Intelligence in Bioinformatics and Systems Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequences, meant as logic units of meaningful term successions, can be considered the backbone of data. Consider, for instance, genetic sequences, where the terms are genetic symbols, or plain natural language sentences, formed by words. To name just few examples of sequence use, consider the adoption of sentences for the description of the real world modeled in the database and their role in composing documents. Searching in sequence repositories often requires going beyond exact matching to determine the sequences which are similar or close to a given query sentence (approximate matching). The similarity involved in this process can be based either on the semantics of the sequence or just on its syntax. The former considers the meaning of the terms in the sequences, and is almost impossible to elaborate the results before the proper extraction and analysis while the later approach is sufficiently comprehensive at implementation level. It finds the number of approximate matches of the sequences for optimal results.