Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
The Journal of Machine Learning Research
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Evolutionary learning with kernels: a generic solution for large margin problems
Proceedings of the 8th annual conference on Genetic and evolutionary computation
YALE: rapid prototyping for complex data mining tasks
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
An efficient SVM-GA feature selection model for large healthcare databases
Proceedings of the 10th annual conference on Genetic and evolutionary computation
Introduction to Information Retrieval
Introduction to Information Retrieval
Proceedings of the 1st ACM International Health Informatics Symposium
Exploiting the systematic review protocol for classification of medical abstracts
Artificial Intelligence in Medicine
A Family of Simple Non-Parametric Kernel Learning Algorithms
The Journal of Machine Learning Research
PICO element detection in medical text without metadata: Are first sentences enough?
Journal of Biomedical Informatics
Automatic text classification to support systematic reviews in medicine
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Objectives: To investigate whether (1) machine learning classifiers can help identify nonrandomized studies eligible for full-text screening by systematic reviewers; (2) classifier performance varies with optimization; and (3) the number of citations to screen can be reduced. Methods: We used an open-source, data-mining suite to process and classify biomedical citations that point to mostly nonrandomized studies from 2 systematic reviews. We built training and test sets for citation portions and compared classifier performance by considering the value of indexing, various feature sets, and optimization. We conducted our experiments in 2 phases. The design of phase I with no optimization was: 4 classifiersx3 feature setsx3 citation portions. Classifiers included k-nearest neighbor, naive Bayes, complement naive Bayes, and evolutionary support vector machine. Feature sets included bag of words, and 2- and 3-term n-grams. Citation portions included titles, titles and abstracts, and full citations with metadata. Phase II with optimization involved a subset of the classifiers, as well as features extracted from full citations, and full citations with overweighted titles. We optimized features and classifier parameters by manually setting information gain thresholds outside of a process for iterative grid optimization with 10-fold cross-validations. We independently tested models on data reserved for that purpose and statistically compared classifier performance on 2 types of feature sets. We estimated the number of citations needed to screen by reviewers during a second pass through a reduced set of citations. Results: In phase I, the evolutionary support vector machine returned the best recall for bag of words extracted from full citations; the best classifier with respect to overall performance was k-nearest neighbor. No classifier attained good enough recall for this task without optimization. In phase II, we boosted performance with optimization for evolutionary support vector machine and complement naive Bayes classifiers. Generalization performance was better for the latter in the independent tests. For evolutionary support vector machine and complement naive Bayes classifiers, the initial retrieval set was reduced by 46% and 35%, respectively. Conclusions: Machine learning classifiers can help identify nonrandomized studies eligible for full-text screening by systematic reviewers. Optimization can markedly improve performance of classifiers. However, generalizability varies with the classifier. The number of citations to screen during a second independent pass through the citations can be substantially reduced.