Biomedical text categorization with concept graph representations using a controlled vocabulary

Authors:
Meenakshi Mishra;Jun Huan;Said Bleik;Min Song
Affiliations:
University of Kansas;University of Kansas;New Jersey Institute of Technology;Yonsei University
Venue:
Proceedings of the 11th International Workshop on Data Mining in Bioinformatics
Year:
2012

Citing 9
Cited 1

Automatic Indexing: An Experimental Inquiry

Journal of the ACM (JACM)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text classification using string kernels

The Journal of Machine Learning Research
Cyclic pattern kernels for predictive graph mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimal assignment kernels for attributed molecular graphs

ICML '05 Proceedings of the 22nd international conference on Machine learning
Graph-based text classification: learn from your neighbors

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Building semantic kernels for text classification using wikipedia

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph kernels based on tree patterns for molecules

Machine Learning
CGM: A biomedical text categorization approach using concept graph mining

BIBMW '09 Proceedings of the 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop

Text Categorization of Biomedical Data Sets Using Graph Kernels and a Controlled Vocabulary

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent work using graph representations for text categorization has shown promising performance over conventional bag-of-words representation of text documents. In this paper we investigate a graph representation of texts for the task of text categorization. In our representation we identify high level concepts extracted from a database of controlled biomedical terms and build a rich graph structure that contains important concepts and relationships. This procedure ensures that graphs are described with a regular vocabulary, leading to increased ease of comparison. We then classify document graphs by applying a set-based graph kernel that is intuitively sensible and able to deal with the disconnectedness of the constructed concept graphs. We compare this approach to standard approaches using non-graph, text-based features. We also do a comparison amongst different kernels that can be used to see which performs better.