BIO-AJAX: an extensible framework for biological data cleaning

Authors:
Katherine G. Herbert;Narain H. Gehani;William H. Piel;Jason T. L. Wang;Cathy H. Wu
Affiliations:
University Heights, Newark, NJ;University Heights, Newark, NJ;State University of New York at Buffalo, Buffalo, NY;University Heights, Newark, NJ;Georgetown University Medical Center, NW, Washington
Venue:
ACM SIGMOD Record
Year:
2004

Citing 6
Cited 4

A knowledge-based approach for duplicate elimination in data cleaning

Information Systems - Data extraction, cleaning and reconciliation
Declarative Data Cleaning: Language, Model, and Algorithms

Proceedings of the 27th International Conference on Very Large Data Bases
Exploratory Data Mining and Data Cleaning

Exploratory Data Mining and Data Cleaning
Unordered Tree Mining with Applications to Phylogeny

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
New techniques for extracting features from protein sequences

IBM Systems Journal - Deep computing for the life sciences
Protein family classification and functional annotation

Computational Biology and Chemistry

A study of phylogenetic tools for genomic nomenclature data cleaning

Proceedings of the 12th annual SIGCSE conference on Innovation and technology in computer science education
Building a disordered protein database: a case study in managing biological data

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Overview and Framework for Data and Information Quality Research

Journal of Data and Information Quality (JDIQ)
A method for similarity-based grouping of biological data

DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

As databases become more pervasive through the biological sciences, various data quality issues regarding data legacy, data uniformity and data duplication arise. Due to the nature of this data, each of these problems is non-trivial. For biological data to be corrected and standardized, new methods and frameworks must be developed. This paper proposes one such framework, called BIO-AJAX, which uses principles from data cleaning to improve data quality in biological information systems, specifically in TreeBASE.