Inferring appropriate eligibility criteria in clinical trial protocols without labeled data

Authors:
Angelo Restificar;Sophia Ananiadou
Affiliations:
The University of Manchester, Manchester, United Kingdom;The University of Manchester, Manchester, United Kingdom
Venue:
Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics
Year:
2012

Citing 11
Cited 1

Latent dirichlet allocation

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
GXP: An Interactive Shell for the Grid Environment

IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
Answer extraction, semantic clustering, and extractive summarization for clinical question answering

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Answering Clinical Questions with Knowledge-Based and Statistical Techniques

Computational Linguistics
Trust Region Newton Method for Logistic Regression

The Journal of Machine Learning Research
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Clinical information retrieval using document and PICO structure

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A practical method for transforming free-text eligibility criteria into computable criteria

Journal of Biomedical Informatics
Text mining for efficient search and assisted creation of clinical trials

Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics

DTMBIO 2012: international workshop on data and text mining in biomedical informatics

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D', |D'|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. We view a document as a mixture of latent topics and our method exploits this by applying a three-step procedure. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA) [3]. Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. Results from our experiments indicate that our proposed method is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.