Measuring Relatedness Between Scientific Entities in Annotation Datasets

Authors:
Guillermo Palma;Maria-Esther Vidal;Eric Haag;Louiqa Raschid;Andreas Thor
Affiliations:
Universidad Simón Bolívar, Caracas, Venezuela;Universidad Simón Bolívar, Caracas, Venezuela;University of Maryland, College Park, USA;University of Maryland, College Park, USA;University of Leipzig, Germany
Venue:
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Year:
2013

Citing 17
Cited 0

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Shape Matching and Object Recognition Using Shape Contexts

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Schema and ontology matching with COMA++

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Measures of semantic similarity and relatedness in the biomedical domain

Journal of Biomedical Informatics
A new method to measure the semantic similarity of GO terms

Bioinformatics
Supervised prediction of drug–target interactions using bipartite local models

Bioinformatics
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Lowest common ancestors in trees and directed acyclic graphs

Journal of Algorithms
Estimating peer similarity using distance of shared files

IPTPS'10 Proceedings of the 9th international conference on Peer-to-peer systems
Schema Matching and Mapping

Schema Matching and Mapping
Fast algorithms for weighted bipartite matching

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Relevance search in heterogeneous networks

Proceedings of the 15th International Conference on Extending Database Technology
Finding cross genome patterns in annotation graphs

DILS'12 Proceedings of the 8th international conference on Data Integration in the Life Sciences
Semantic textual similarity using maximal weighted bipartite graph matching

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Ontology Matching: State of the Art and Future Challenges

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linked Open Data has made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms (CV terms) from ontologies. These semantic annotations encode scientific knowledge which is captured in annotation datasets. One can mine these datasets to discover relationships and patterns between entities. Determining the relatedness (or similarity) between entities becomes a building block for graph pattern mining, e.g., identifying drug-drug relationships could depend on the similarity of the diseases (conditions) that are associated with each drug. Diverse similarity metrics have been proposed in the literature, e.g., i) string-similarity metrics; ii) path-similarity metrics; iii) topological-similarity metrics; all measure relatedness in a given taxonomy or ontology. In this paper, we consider a novel annotation similarity metric AnnSim that measures the relatedness between two entities in terms of the similarity of their annotations. We model AnnSim as a 1-to-1 maximal weighted bipartite match, and we exploit properties of existing solvers to provide an efficient solution. We empirically study the effectiveness of AnnSim on real-world datasets of genes and their GO annotations, clinical trials, and a human disease benchmark. Our results suggest that AnnSim can provide a deeper understanding of the relatedness of concepts and can provide an explanation of potential novel patterns.