Annotating text segments using a web-based categorization approach

Authors:
Hsin-Chen Chiao;Hsiao-Tieh Pu;Lee-Feng Chien
Affiliations:
Institute of Information Science, Academia Sinica, Taipei, Taiwan;Graduate Institute of Library & Information Studies, National Taiwan Normal University, Taipei, Taiwan;Institute of Information Science, Academia Sinica, Taipei, Taiwan
Venue:
ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
Year:
2005

Citing 8
Cited 0

Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Learning text analysis rules for domain-specific natural language processing

Learning text analysis rules for domain-specific natural language processing
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Liveclassifier: creating hierarchical text classifiers through web corpora

Proceedings of the 13th international conference on World Wide Web
Text mining in a digital library

International Journal on Digital Libraries
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conventional automatic text annotation tools mostly extract named entities from texts and annotate them with information about persons, locations, and dates, etc. Such kind of entity type information, however, is insufficient for machines to understand the context or facts contained in the texts. This paper presents a general text categorization approach to categorize text segments into broader subject categories, such as categorizing a text string into a category of paper title in Mathematics or a category of conference name in Computer Science. Experimental results confirm its wide applicability to various digital library applications.