A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts

Authors:
Rimma Pivovarov;NoéMie Elhadad
Affiliations:
Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, VC-5, New York, NY 10032, USA;Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, VC-5, New York, NY 10032, USA
Venue:
Journal of Biomedical Informatics
Year:
2012

Citing 26
Cited 2

Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Determining Semantic Similarity among Entity Classes from Different Ontologies

IEEE Transactions on Knowledge and Data Engineering
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

IEEE Transactions on Knowledge and Data Engineering
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Towards the development of a conceptual distance metric for the UMLS

Journal of Biomedical Informatics
Formulating context-dependent similarity functions

Proceedings of the 13th annual ACM international conference on Multimedia
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Measures of semantic similarity and relatedness in the biomedical domain

Journal of Biomedical Informatics
Kinds of Contexts and their Impact on Semantic Similarity Measurement

PERCOM '08 Proceedings of the 2008 Sixth Annual IEEE International Conference on Pervasive Computing and Communications
Introduction to Information Retrieval

Introduction to Information Retrieval
How Can the Term Compositionality Be Useful for Acquiring Elementary Semantic Relations?

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
Methodological Review: Empirical distributional semantics: Methods and biomedical applications

Journal of Biomedical Informatics
Ontology quality assurance through analysis of term transformations

Bioinformatics
Mining a lexicon of technical terms and lay equivalents

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
The effect of context on semantic similarity measurement

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
Data Mining in Biomedicine Using Ontologies

Data Mining in Biomedicine Using Ontologies
Section classification in clinical notes using supervised hidden markov model

Proceedings of the 1st ACM International Health Informatics Symposium
A survey of paraphrasing and textual entailment methods

Journal of Artificial Intelligence Research
Directional distributional similarity for lexical inference

Natural Language Engineering
An ontology-based measure to compute semantic similarity in biomedicine

Journal of Biomedical Informatics
Towards a framework for developing semantic relatedness reference standards

Journal of Biomedical Informatics
A context-aware semantic similarity model for ontology environments

Concurrency and Computation: Practice & Experience

An ontology-based similarity measure for biomedical data - Application to radiology reports

Journal of Biomedical Informatics
Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

An open research question when leveraging ontological knowledge is when to treat different concepts separately from each other and when to aggregate them. For instance, concepts for the terms ''paroxysmal cough'' and ''nocturnal cough'' might be aggregated in a kidney disease study, but should be left separate in a pneumonia study. Determining whether two concepts are similar enough to be aggregated can help build better datasets for data mining purposes and avoid signal dilution. Quantifying the similarity among concepts is a difficult task, however, in part because such similarity is context-dependent. We propose a comprehensive method, which computes a similarity score for a concept pair by combining data-driven and ontology-driven knowledge. We demonstrate our method on concepts from SNOMED-CT and on a corpus of clinical notes of patients with chronic kidney disease. By combining information from usage patterns in clinical notes and from ontological structure, the method can prune out concepts that are simply related from those which are semantically similar. When evaluated against a list of concept pairs annotated for similarity, our method reaches an AUC (area under the curve) of 92%.