Automatic Indexing: An Experimental Inquiry
Journal of the ACM (JACM)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text classification using string kernels
The Journal of Machine Learning Research
Cyclic pattern kernels for predictive graph mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimal assignment kernels for attributed molecular graphs
ICML '05 Proceedings of the 22nd international conference on Machine learning
Graph-based text classification: learn from your neighbors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Building semantic kernels for text classification using wikipedia
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph kernels based on tree patterns for molecules
Machine Learning
CGM: A biomedical text categorization approach using concept graph mining
BIBMW '09 Proceedings of the 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop
Text Categorization of Biomedical Data Sets Using Graph Kernels and a Controlled Vocabulary
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Recent work using graph representations for text categorization has shown promising performance over conventional bag-of-words representation of text documents. In this paper we investigate a graph representation of texts for the task of text categorization. In our representation we identify high level concepts extracted from a database of controlled biomedical terms and build a rich graph structure that contains important concepts and relationships. This procedure ensures that graphs are described with a regular vocabulary, leading to increased ease of comparison. We then classify document graphs by applying a set-based graph kernel that is intuitively sensible and able to deal with the disconnectedness of the constructed concept graphs. We compare this approach to standard approaches using non-graph, text-based features. We also do a comparison amongst different kernels that can be used to see which performs better.