Mining Knowledge from Text Collections Using Automatically Generated Metadata

Authors:
John M. Pierre
Affiliations:
-
Venue:
PAKM '02 Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management
Year:
2002

Citing 7
Cited 2

Evaluating text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
Fast discovery of association rules

Advances in knowledge discovery and data mining
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Concept-based knowledge discovery in texts extracted from the Web

ACM SIGKDD Explorations Newsletter
Evaluating the novelty of text-mined rules using lexical knowledge

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

A Novel Method of Automobiles' Chinese Nickname Recognition

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
PKUNEI --- A Knowledge---Based Approach for Chinese Product Named Entity Semantic Identification

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining is typically applied to large databases of highly structured information in order to discover new knowledge. In businesses and institutions, the amount of information existing in repositories of text documents usually rivals or surpasses the amount found in relational databases. Though the amount of potentially valuable knowledge contained in document collections can be great, they are often difficult to analyze. Therefore, it is important to develop methods to efficiently discover knowledge embedded in these document repositories. In this paper we describe an approach for mining knowledge from text collections by applying data mining techniques to metadata records generated via automated text categorization. By controlling the set of metadata fields as well as the set of assigned categories we can customize the knowledge discovery task to address specific questions. As an example, we apply the approach to a large collection of product reviews and evaluate the performance of the knowledge discovery.