Learning string similarity measures for gene/protein name dictionary look-up using logistic regression

Authors:
Yoshimasa Tsuruoka;John McNaught;Jun'/i/chi Tsujii;Sophia Ananiadou
Affiliations:
-;-;-;-
Venue:
Bioinformatics
Year:
2007

Citing 0
Cited 8

Kleio: a knowledge-enriched information retrieval system for biology

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Measuring prediction capacity of individual verbs for the identification of protein interactions

Journal of Biomedical Informatics
A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression

The Journal of Machine Learning Research
Extracting and normalizing gene/protein mentions with the flexible and trainable moara java library

ISMB/ECCB'09 Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology
String similarity measures and joins with synonyms

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
"Mining events from the literature for bioinformatics applications" by S. Ananiadou, P. Thompson, and R. Nawaz; with Martin Vesely as coordinator

ACM SIGWEB Newsletter
Towards a Protein-Protein Interaction information extraction system: Recognizing named entities

Knowledge-Based Systems
ProNormz - An integrated approach for human proteins and protein kinases normalization

Journal of Biomedical Informatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: One of the bottlenecks of biomedical data integration is variation of terms. Exact string matching often fails to associate a name with its biological concept, i.e. ID or accession number in the database, due to seemingly small differences of names. Soft string matching potentially enables us to find the relevant ID by considering the similarity between the names. However, the accuracy of soft matching highly depends on the similarity measure employed. Results: We used logistic regression for learning a string similarity measure from a dictionary. Experiments using several large-scale gene/protein name dictionaries showed that the logistic regression-based similarity measure outperforms existing similarity measures in dictionary look-up tasks. Availability: A dictionary look-up system using the similarity measures described in this article is available at http://text0.mib.man.ac.uk/software/mldic/ Contact: yoshimasa.tsuruoka@manchester.ac.uk