Unsupervised mining of frequent tags for clinical eligibility text indexing

Authors:
Riccardo Miotto;Chunhua Weng
Affiliations:
Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA;Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA and The Irving Institute for Clinical and Translational Research, Columbia University, New York, NY 10032, USA
Venue:
Journal of Biomedical Informatics
Year:
2013

Citing 21
Cited 1

The vocabulary problem in human-system communication

Communications of the ACM
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness

Proceedings of the tenth international conference on Information and knowledge management
Latent dirichlet allocation

The Journal of Machine Learning Research
Learning ontologies from natural language texts

International Journal of Human-Computer Studies
Why do tagging systems work?

CHI '06 Extended Abstracts on Human Factors in Computing Systems
NLTK: the natural language toolkit

COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
Understanding the efficiency of social tagging systems using information theory

Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Introduction to Information Retrieval

Introduction to Information Retrieval
Reviewing and Evaluating Automatic Term Recognition Techniques

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Methodological Review: What can natural language processing do for clinical decision support?

Journal of Biomedical Informatics
Music information retrieval using social tags and audio

IEEE Transactions on Multimedia - Special section on communities and media computing
Interactive information retrieval

Annual Review of Information Science and Technology
Methodological Review: Formal representation of eligibility criteria: A literature review

Journal of Biomedical Informatics
Methodological Review: Natural Language Processing methods and systems for biomedical ontology learning

Journal of Biomedical Informatics
A practical method for transforming free-text eligibility criteria into computable criteria

Journal of Biomedical Informatics
Multimedia tagging: past, present and future
Developing a robust part-of-speech tagger for biomedical text

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Constructing a true LCSH tree of a science and engineering collection

Journal of the American Society for Information Science and Technology
A human-computer collaborative approach to identifying common data elements in clinical trial eligibility criteria

Journal of Biomedical Informatics
Special Communication: Natural language processing: State of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine

Journal of Biomedical Informatics

eTACTS: A method for dynamically filtering clinical trial search results

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clinical text, such as clinical trial eligibility criteria, is largely underused in state-of-the-art medical search engines due to difficulties of accurate parsing. This paper proposes a novel methodology to derive a semantic index for clinical eligibility documents based on a controlled vocabulary of frequent tags, which are automatically mined from the text. We applied this method to eligibility criteria on ClinicalTrials.gov and report that frequent tags (1) define an effective and efficient index of clinical trials and (2) are unlikely to grow radically when the repository increases. We proposed to apply the semantic index to filter clinical trial search results and we concluded that frequent tags reduce the result space more efficiently than an uncontrolled set of UMLS concepts. Overall, unsupervised mining of frequent tags from clinical text leads to an effective semantic index for the clinical eligibility documents and promotes their computational reuse.