CONANN: an online biomedical concept annotator

Authors:
Lawrence H. Reeve;Hyoil Han
Affiliations:
College of Information Science and Technology, Drexel University, Philadelphia, PA;College of Information Science and Technology, Drexel University, Philadelphia, PA
Venue:
DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Year:
2007

Citing 9
Cited 1

SAPHIRE—an information retrieval system featuring concept matching, automatic indexing, probabilistic retrieval, and hierarchical relationships

Computers and Biomedical Research
Concept locator: a client-server application for retrieval of UMLS Metathesaurus concepts through complex Boolean query

Computers and Biomedical Research
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Selective automated indexing of findings and diagnoses in radiology reports

Computers and Biomedical Research
On deep annotation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Concept frequency distribution in biomedical text summarization

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
ORANGE: a method for evaluating automatic evaluation metrics for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

NCBO Resource Index: Ontology-based search and mining of biomedical resources

Web Semantics: Science, Services and Agents on the World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe our biomedical concept annotator designed for online environments, CONANN, which takes a biomedical source phrase and finds the best-matching biomedical concept from a domain resource. Domain concepts are defined in resources such as the United States National Library of Medicine's Unified Medical Language System Metathesaurus. CONANN uses an incremental filtering approach to narrow down a list of candidate phrases before deciding on a best match. We show that this approach has the advantage of improving annotation speed over an existing state-of-the-art concept annotator, facilitating the use of concept annotation in online environments. Our main contributions are 1) the design of a phrase-unit concept annotator more readily usable in online environments than existing systems, 2) the introduction of a model which uses semantically focused words in a given ontology (e.g., UMLS) to measure coverage, called Inverse Phrase Frequency, and 3) the use of two different filters to measure coverage and coherence between a source phrase and a domain-specific candidate phrase. An intrinsic evaluation comparing CONANN's concept output to a state-of-the-art concept annotator shows our system has an annotation precision ranging from 90% for exact match concept to 95% for relaxed concept matching while average phrase annotation time is eighteen times faster. In addition, an extrinsic evaluation using the generated concepts in a text summarization task shows no significant degradation when using CONANN.