Automatic scientific text classification using local patterns: KDD CUP 2002 (task 1)

Authors:
Moustafa M. Ghanem;Yike Guo;Huma Lodhi;Yong Zhang
Affiliations:
Imperial College of Science Technology & Medicine, London, UK;Imperial College of Science Technology & Medicine, London, UK;Imperial College of Science Technology & Medicine, London, UK;Imperial College of Science Technology & Medicine, London, UK
Venue:
ACM SIGKDD Explorations Newsletter
Year:
2002

Citing 0
Cited 7

Classifying biological articles using web resources

Proceedings of the 2004 ACM symposium on Applied computing
Effect of word density on measuring words association

COMPUTE '08 Proceedings of the 1st Bangalore Annual Compute Conference
Rule-Based Protein Term Identification with Help from Automatic Species Tagging

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Feature generation and representations for protein-protein interaction classification

Journal of Biomedical Informatics
Analysing scientific workflows with Computational Tree Logic

Cluster Computing
BioDR: Semantic indexing networks for biomedical document retrieval

Expert Systems with Applications: An International Journal
Text classification with the support of pruned dependency patterns

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe our approach for addressing Task 1 in the KDD CUP 2002 competition. The approach is based on developing and using an improved automatic feature selection method in conjunction with traditional classifiers. The feature selection method used is based on capturing frequently occurring keyword combinations (or motifs) within short segments of the text of a document and has proved to produce more accurate classification results than approaches relying solely on using keyword-based features.