Data mining for motifs in DNA sequences

Authors:
D. A. Bell;J. W. Guan
Affiliations:
School of Computer Science,The Queen's University of Belfast, Northern Ireland, UK;School of Computer Science,The Queen's University of Belfast, Northern Ireland, UK and College of Computer Science and Technology, Jilin University, Changchun, P.R.China
Venue:
RSFDGrC'03 Proceedings of the 9th international conference on Rough sets, fuzzy sets, data mining, and granular computing
Year:
2003

Citing 7
Cited 0

Computational methods for rough classification and discovery

Journal of the American Society for Information Science - Special issue: knowledge discovery and data mining
Rough computational methods for information systems

Artificial Intelligence
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Text Mining at the Term Level

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Trend Graphs: Visualizing the Evolution of Concept Relationships in Large Document Collections

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
TextVis: An Integrated Visual Environment for Text Mining

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the large collections of genomic information accumulated in recent years there is potentially significant knowledge for exploitation in medicine and in the pharmaceutical industry. One interesting approach to the distillation of such knowledge is to detect strings in DNA sequences which are very repetitive within a given sequence (eg for a particular patient) or across sequences (eg from different patients who have been classified in some way eg as sharing a particular medical diagnosis). Motifs are strings that occur relatively frequently. In this paper we present basic theory and algorithms for finding such frequent and common strings. We are particularly interested in strings which are maximally frequent and, having discovered very frequent motifs we show how to mine association rules by an existing rough sets based technique. Further work and applications are in process.