The vocabulary problem in human-system communication
Communications of the ACM
Automatic text structuring and retrieval-experiments in automatic encyclopedia searching
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
The automatic identification of stop words
Journal of Information Science
Subtopic structuring for full-length document access
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory
The nature of statistical learning theory
Elements of machine learning
Journal of the American Society for Information Science
Corpus-based statistical screening for content-bearing terms
Journal of the American Society for Information Science and Technology
Information Retrieval
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Two biomedical sublanguages: a description based on the theories of Zellig Harris
Journal of Biomedical Informatics - Special issue: Sublanguage
TextTiling: A Quantitative Approach to Discourse
TextTiling: A Quantitative Approach to Discourse
Solving large scale linear prediction problems using stochastic gradient descent algorithms
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Automatic text categorization using the importance of sentences
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
MedPost: a part-of-speech tagger for bioMedical text
Bioinformatics
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
The importance of the lexicon in tagging biological text
Natural Language Engineering
Learning to rank for information retrieval (LR4IR 2007)
ACM SIGIR Forum
Inter-coder agreement for computational linguistics
Computational Linguistics
Hi-index | 0.00 |
We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior. The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average. We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure.