Automated comparative auditing of NCIT genomic roles using NCBI

Authors:
Barry Cohen;Marc Oren;Hua Min;Yehoshua Perl;Michael Halper
Affiliations:
Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA;Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA;Fox Chase Cancer Center, Philadelphia, PA 19111, USA;Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA;Computer Science Department, Kean University, Union, NJ 07083, USA
Venue:
Journal of Biomedical Informatics
Year:
2008

Citing 9
Cited 4

Semantic refinement and error correction in large terminological knowledge bases

Data & Knowledge Engineering
An introduction to description logics

The description logic handbook
A reference ontology for biomedical informatics: the foundational model of anatomy

Journal of Biomedical Informatics - Special issue: Unified medical language system
Mapping the Gene Ontology into the Unified Medical Language System: Research Papers

Comparative and Functional Genomics
Modeling a description logic vocabulary for cancer research

Journal of Biomedical Informatics
NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information

Journal of Biomedical Informatics
Debugging Incoherent Terminologies

Journal of Automated Reasoning
Manual curation is not sufficient for annotation of genomic databases

Bioinformatics
Oncology ontology in the NCI thesaurus

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine

A review of auditing methods applied to the content of controlled biomedical terminologies

Journal of Biomedical Informatics
Auditing associative relations across two knowledge sources

Journal of Biomedical Informatics
The NCI Thesaurus quality assurance life cycle

Journal of Biomedical Informatics
Relationship auditing of the FMA ontology

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biomedical research has identified many human genes and various knowledge about them. The National Cancer Institute Thesaurus (NCIT) represents such knowledge as concepts and roles (relationships). Due to the rapid advances in this field, it is to be expected that the NCIT's Gene hierarchy will contain role errors. A comparative methodology to audit the Gene hierarchy with the use of the National Center for Biotechnology Information's (NCBI's) Entrez Gene database is presented. The two knowledge sources are accessed via a pair of Web crawlers to ensure up-to-date data. Our algorithms then compare the knowledge gathered from each, identify discrepancies that represent probable errors, and suggest corrective actions. The primary focus is on two kinds of gene-roles: (1) the chromosomal locations of genes, and (2) the biological processes in which genes play a role. Regarding chromosomal locations, the discrepancies revealed are striking and systematic, suggesting a structurally common origin. In regard to the biological processes, difficulties arise because genes frequently play roles in multiple processes, and processes may have many designations (such as synonymous terms). Our algorithms make use of the roles defined in the NCIT Biological Process hierarchy to uncover many probable gene-role errors in the NCIT. These results show that automated comparative auditing is a promising technique that can identify a large number of probable errors and corrections for them in a terminological genomic knowledge repository, thus facilitating its overall maintenance.