A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Content-based book recommending using learning for text categorization
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Text databases & document management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Predicting library of congress classifications from library of congress subject headings
Journal of the American Society for Information Science and Technology
Language independent NER using a maximum entropy tagger
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A scalable assistant librarian: hierarchical subject classification of books
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Extracting useful information from the full text of fiction
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Hi-index | 0.00 |
We describe work on automatically assigning classification labels to books using the Library of Congress Classification scheme. This task is non-trivial due to the volume and variety of books that exist. We explore the utility of Information Extraction (IE) techniques within this text categorisation (TC) task, automatically extracting structured information from the full text of books. Experimental evaluation of performance involves a corpus of books from Project Gutenberg. Results indicate that a classifier which combines methods and tools from IE and TC significantly improves over a state-of-the-art text classifier, achieving a classification performance of Fβ=1 = 0.8099.