The Journal of Machine Learning Research
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
GXP: An Interactive Shell for the Grid Environment
IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
Answer extraction, semantic clustering, and extractive summarization for clinical question answering
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Answering Clinical Questions with Knowledge-Based and Statistical Techniques
Computational Linguistics
Trust Region Newton Method for Logistic Regression
The Journal of Machine Learning Research
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Clinical information retrieval using document and PICO structure
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A practical method for transforming free-text eligibility criteria into computable criteria
Journal of Biomedical Informatics
Text mining for efficient search and assisted creation of clinical trials
Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics
DTMBIO 2012: international workshop on data and text mining in biomedical informatics
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
We consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D', |D'|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. We view a document as a mixture of latent topics and our method exploits this by applying a three-step procedure. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA) [3]. Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. Results from our experiments indicate that our proposed method is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.