MEDLINE MeSH indexing: lessons learned from machine learning and future directions

Authors:
Antonio Jimeno Yepes;James G. Mork;BartBomiej Wilkowski;Dina Demner Fushman;Alan R. Aronson
Affiliations:
National Library of Medicine, Bethesda, MD, USA;National Library of Medicine, Bethesda, MD, USA;Technical University of Denmark, Lyngby, Denmark;National Library of Medicine, Bethesda, MD, USA;National Library of Medicine, Bethesda, MD, USA
Venue:
Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Year:
2012

Citing 5
Cited 0

Automatic indexing of documents from journal descriptors: a preliminary investigation

Journal of the American Society for Information Science
Latent dirichlet allocation

The Journal of Machine Learning Research
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
Automatic assignment of biomedical categories: toward a generic approach

Bioinformatics
MeSH Up

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the large yearly growth of MEDLINE, MeSH indexing is becoming a more difficult task for a relatively small group of highly qualified indexing staff at the US National Library of Medicine (NLM). The Medical Text Indexer (MTI) is a support tool for assisting indexers; this tool relies on MetaMap and a k-NN approach called PubMed Related Citations (PRC). Our motivation is to improve the quality of MTI based on machine learning. Typical machine learning approaches fit this indexing task into text categorization. In this work, we have studied some Medical Subject Headings (MeSH) recommended by MTI and analyzed the issues when using standard machine learning algorithms. We show that in some cases machine learning can improve the annotations already recommended by MTI, that machine learning based on low variance methods achieves better performance and that each MeSH heading presents a different behavior. In addition, there are several factors which make this task difficult (e.g. limited access to the full-text of the citations) which provide direction for future work.