C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
A vector space model for automatic indexing
Communications of the ACM
Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Extraction, layout analysis and classification of diagrams in PDF documents
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Improvements to Platt's SMO Algorithm for SVM Classifier Design
Neural Computation
Integrating image data into biomedical text categorization
Bioinformatics
Substring selection for biomedical document classification
Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A review of feature selection techniques in bioinformatics
Bioinformatics
Exploring text and image features to classify images in bioscience literature
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Guest editorial: Data mining for the study of disease genes and proteins
Artificial Intelligence in Medicine
Hi-index | 0.00 |
Objective: As more full-text biomedical papers are becoming available in digitized form online, there is a need for tools to mine information from all parts of such papers. Because the figures and legends/captions in biomedical papers provide important information about research outcomes, mining techniques targeting them have attracted a great deal of attention. In this study, we focused on pathway figures that illustrate signaling or metabolic pathways, because many of these are important in understanding disease mechanism(s). We developed a figure classification system based on textual information contained in biomedical papers to provide an automated acquisition system for such pathway figures. Materials and methods: We used full-text journal articles available on PubMed Central as our data set. We used several supervised machine learning methods, such as decision tree and a support vector machine, to classify figures in the data set. We compared the classification performance among the cases using only figure legends, using only sentences referring to the figure in the main text of the article, and combining figure legends with sentences referring to the figure in the main text of the article. Results: Compared with previous related work, a sufficiently high performance was achieved with the figure legends alone. The performance with the sentences referring to the figure in the main text was actually lower than that with the figure legends alone, indicating that focusing on the main text alone is inadequate. The combination of legend and main text clearly had an effect, but including the prior and following sentences in addition to the sentence referring to the figure dramatically improved the performance. Conclusions: We developed an automatic pathway figure classification system based on both figure legends and the main text that has quite a high degree of accuracy. To our knowledge, this is the first attempt to address a figure classification task using legends and the main text, and it may provide a first stage for achieving efficient figure mining.