Analysis of eligibility criteria representation in industry-standard clinical trial protocols

Authors:
Sanmitra Bhattacharya;Michael N. Cantor
Affiliations:
Department of Computer Science, The University of Iowa, 14 MacLean Hall, Iowa City, IA 52242, United States;Pfizer Inc., 235 E 42nd Street, New York, NY 10017, United States
Venue:
Journal of Biomedical Informatics
Year:
2013

Citing 5
Cited 0

A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Improved annotation of the blogosphere via autotagging and hierarchical clustering

Proceedings of the 15th international conference on World Wide Web
Methodological Review: Formal representation of eligibility criteria: A literature review

Journal of Biomedical Informatics
Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous research on standardization of eligibility criteria and its feasibility has traditionally been conducted on clinical trial protocols from ClinicalTrials.gov (CT). The portability and use of such standardization for full-text industry-standard protocols has not been studied in-depth. Towards this end, in this study we first compare the representation characteristics and textual complexity of a set of Pfizer's internal full-text protocols to their corresponding entries in CT. Next, we identify clusters of similar criteria sentences from both full-text and CT protocols and outline methods for standardized representation of eligibility criteria. We also study the distribution of eligibility criteria in full-text and CT protocols with respect to pre-defined semantic classes used for eligibility criteria classification. We find that in comparison to full-text protocols, CT protocols are not only more condensed but also convey less information. We also find no correlation between the variations in word-counts of the ClinicalTrials.gov and full-text protocols. While we identify 65 and 103 clusters of inclusion and exclusion criteria from full text protocols, our methods found only 36 and 63 corresponding clusters from CT protocols. For both the full-text and CT protocols we are able to identify 'templates' for standardized representations with full-text standardization being more challenging of the two. In our exploration of the semantic class distributions we find that the majority of the inclusion criteria from both full-text and CT protocols belong to the semantic class ''Diagnostic and Lab Results'' while ''Disease, Sign or Symptom'' forms the majority for exclusion criteria. Overall, we show that developing a template set of eligibility criteria for clinical trials, specifically in their full-text form, is feasible and could lead to more efficient clinical trial protocol design.