Biomedical text categorization with concept graph representations using a controlled vocabulary
Proceedings of the 11th International Workshop on Data Mining in Bioinformatics
Text Categorization of Biomedical Data Sets Using Graph Kernels and a Controlled Vocabulary
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Text Categorization is used to organize and manage biomedical text databases that are growing at an exponential rate. Feature representations for documents are a crucial factor for the performance of text categorization. Most of the successful existing techniques use a vector representation based on key entities extracted from the text. In this paper we investigate a new direction where we represent a document as a graph. In this representation we identify high level concepts and build a rich graph structure that contains additional concepts and relationships. We then use graph kernel techniques to perform text categorization. The results show a significant improvement in accuracy when compared to categorization based on only the extracted concepts.