Finding related sentence pairs in MEDLINE

Authors:
Larry H. Smith;W. John Wilbur
Affiliations:
Computational Biology Branch, National Center for Biotechnology Information, Bethesda, USA 20894;Computational Biology Branch, National Center for Biotechnology Information, Bethesda, USA 20894
Venue:
Information Retrieval
Year:
2010

Citing 20
Cited 0

The vocabulary problem in human-system communication

Communications of the ACM
Automatic text structuring and retrieval-experiments in automatic encyclopedia searching

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
The automatic identification of stop words

Journal of Information Science
Subtopic structuring for full-length document access

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
Elements of machine learning

Elements of machine learning
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
A comparison of group and individual performance among subject experts and untrained workers at the document retrieval task

Journal of the American Society for Information Science
Corpus-based statistical screening for content-bearing terms

Journal of the American Society for Information Science and Technology
Information Retrieval

Information Retrieval
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Two biomedical sublanguages: a description based on the theories of Zellig Harris

Journal of Biomedical Informatics - Special issue: Sublanguage
TextTiling: A Quantitative Approach to Discourse

TextTiling: A Quantitative Approach to Discourse
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Automatic text categorization using the importance of sentences

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
MedPost: a part-of-speech tagger for bioMedical text

Bioinformatics
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
The importance of the lexicon in tagging biological text

Natural Language Engineering
Learning to rank for information retrieval (LR4IR 2007)

ACM SIGIR Forum
Inter-coder agreement for computational linguistics

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior. The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average. We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure.