The nature of statistical learning theory
The nature of statistical learning theory
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
XRules: an effective structural classifier for XML data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Term identification in the biomedical literature
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Journal of Biomedical Informatics
Bioinformatics
The role of roles in classifying annotated biomedical text
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Annotating attributions and private states
CorpusAnno '05 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
Ontological analysis of taxonomic relationships
ER'00 Proceedings of the 19th international conference on Conceptual modeling
An order-sorted quantified modal logic for meta-ontology
TABLEAUX'05 Proceedings of the 14th international conference on Automated Reasoning with Analytic Tableaux and Related Methods
Guest Editorial: Current issues in biomedical text mining and natural language processing
Journal of Biomedical Informatics
Classifying Vietnamese disease outbreak reports with important sentences and rich features
Proceedings of the Third Symposium on Information and Communication Technology
The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions
Journal of Biomedical Informatics
Hi-index | 0.00 |
This paper explores the role of named entities (NEs) in the classification of disease outbreak report. In the annotation schema of BioCaster, a text mining system for public health protection, important concepts that reflect information about infectious diseases were conceptually analyzed with a formal ontological methodology and classified into types and roles. Types are specified as NE classes and roles are integrated into NEs as attributes such as a chemical and whether it is being used as a therapy for some infectious disease. We focus on the roles of NEs and explore different ways to extract, combine and use them as features in a text classifier. In addition, we investigate the combination of roles with semantic categories of disease-related nouns and verbs. Experimental results using naive Bayes and Support Vector Machine (SVM) algorithms show that: (1) roles in combination with NEs improve performance in text classification, (2) roles in combination with semantic categories of noun and verb features contribute substantially to the improvement of text classification. Both these results were statistically significant compared to the baseline ''raw text'' representation. We discuss in detail the effects of roles on each NE and on semantic categories of noun and verb features in terms of accuracy, precision/recall and F-score measures for the text classification task.