Efficient classification method for complex biological literature using text and data mining combination

Authors:
Yun Jeong Choi;Seung Soo Park
Affiliations:
Department of Computer Science & Engineering, Ewha Womans University, Seoul, Korea;Department of Computer Science & Engineering, Ewha Womans University, Seoul, Korea
Venue:
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Year:
2006

Citing 6
Cited 0

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Athena: Mining-Based Interactive Management of Text Database

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
A multistrategy approach for digital text categorization from imbalanced documents

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A MFoM learning approach to robust multiclass multi-label text categorization

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Gene name ambiguity of eukaryotic nomenclatures

Bioinformatics
Refinement method of post-processing and training for improvement of automated text classification

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, as the size of genetic knowledge grows faster, the automated analysis and systemization into high-throughput database has become a hot issue. In bioinformatics area, one of the essential tasks is to recognize and identify genomic entities and discover their relations from various sources. Generally, biological literatures containing ambiguous entities, are laid by decision boundaries. The purpose of this paper is to design and implement a classification system for improving performance in identifying entity problems. The system is based on reinforcement training and post-processing method and supplemented by data mining algorithms to enhance its performance. For experiments, we add some intentional noises to training data for testing the robustness and stability. The result shows significantly improved stability on training errors.