Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Strategies for minimising errors in hierarchical web categorisation
Proceedings of the eleventh international conference on Information and knowledge management
Hierarchical Text Categorization Using Neural Networks
Information Retrieval
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Hi-index | 0.00 |
Annotating genes and their products with Gene Ontology codes is an important area of research. One approach for doing this is to use the information available about these genes in the biomedical literature. Our goal, based on this approach, is to develop automatic methods for annotation that could supplement the expensive manual annotation processes currently in place. Using a set of Support Vector Machines (SVM) classifiers we were able to achieve Fscores of 0.48, 0.4 and 0.32 for codes of the molecular function, cellular component and biological process GO hierarchies respectively. We explore thresholding of SVM scores, the relationship of performance to hierarchy level and to the number of positives in the training sets. We find that hierarchy level is important especially for the molecular function and biological process hierarchies. We find that the cellular component hierarchy stands apart from the other two in many respects. This may be due to fundamental differences in link semantics. This research also exploits the hierarchical structures by defining and testing a relaxed criteria for classification correctness.