Incremental maintenance of biological databases using association rule mining

Authors:
Kai-Tak Lam;Judice L. Y. Koh;Bharadwaj Veeravalli;Vladimir Brusic
Affiliations:
Department of Electrical & Computer Engineering, National University of Singapore, Singapore;Institute for Infocomm Research, Singapore;Department of Electrical & Computer Engineering, National University of Singapore, Singapore;Australian Centre for Plant Functional Genomics, School of Land and Food Sciences, and the Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
Venue:
PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
Year:
2006

Citing 8
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Introduction: named entity recognition in biomedicine

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Recognizing names in biomedical texts: a machine learning approach

Bioinformatics
Systematic analysis of snake neurotoxins' functional classification using a data warehousing approach

Bioinformatics
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text

Bioinformatics
The GENIA corpus: an annotated research abstract corpus in molecular biology domain

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Introduction to the bio-entity recognition task at JNLPBA

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biological research frequently requires specialist databases to support in-depth analysis about specific subjects. With the rapid growth of biological sequences in public domain data sources, it is difficult to keep these databases current with the sources. Simple queries formulated to retrieve relevant sequences typically return a large number of false matches and thus demanding manual filtration. In this paper, we propose a novel methodology that can support automatic incremental updating of specialist databases. Complex queries for incremental updating of relevant sequences are learned using Association Rule Mining (ARM), resulting in a significant reduction in false positive matches. This is the first time ARM is used in formulating descriptive queries for the purpose of incremental maintenance of specialised biological databases. We have implemented and tested our methodology on two real-world databases. Our experiments conclusively show that the methodology guarantees an F-score of up to 80% in detecting new sequences for these two databases.