Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Deriving concept hierarchies from text
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Automatic RDF metadata generation for resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Yahoo! as an ontology: using Yahoo! categories to describe documents
Proceedings of the eighth international conference on Information and knowledge management
ACM SIGKDD Explorations Newsletter
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Generating natural language summaries from multiple on-line sources
Computational Linguistics - Special issue on natural language generation
A memory-based approach to learning shallow natural language patterns
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improved source-channel models for Chinese word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Named entity extraction based on a maximum entropy model and transformation rules
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Database summarization using fuzzy ISA hierarchies
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
From manual to semi-automatic semantic annotation: about ontology-based text annotation tools
Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content
Finding a catalog: generating analytical catalog records from well-structured digital texts
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Web-page summarization using clickthrough data
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Conventional tools for automatic metadata creation mostly extract named entities or text segments from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often insufficient for machines to understand the facts contained in the texts, thus precluding the possibility of implementing more advanced, intelligent applications, such as concept-based search. In this work, we try to create more refined thematic metadata inherent in texts. Based on Web resource mining, our approach acquires training corpora necessary to describe both the thematic categories and the metadata extracted from the texts. The approach then finds the corresponding relationships among them by means of categorization and thus generates thematic metadata for the textual data. Experimental results confirm the potential and wide adaptability of our approach.