Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
On the application of syntactic methodologies in automatic text analysis
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Word association norms, mutual information, and lexicography
Computational Linguistics
Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Constant interaction-time scatter/gather browsing of very large document collections
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Cluster-based text categorization: a comparison of category search strategies
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
From data mining to knowledge discovery: an overview
Advances in knowledge discovery and data mining
Exploiting Background Information in Knowledge Discovery from Text
Journal of Intelligent Information Systems
ECML '93 Proceedings of the European Conference on Machine Learning
Mining Generalized Association Rules
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Towards automatic extraction of monolingual and bilingual terminology
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Hi-index | 0.00 |
The information age is characterized by a rapid growth in the amount of information available in electronic media. Traditional data handling methods are not adequate to cope with this flood of information. Knowledge discovery in databases (KDD) is a new paradigm that focuses on automatic or semiautomatic exploration of large amounts of data and on discovery of relevant and interesting patterns within them. While most work on KDD is concerned with structured databases, it is clear that this paradigm is required for handling the huge amount of information that is available only in unstructured textual form. To apply KDD on texts, it is necessary to impose some structure on the data that would be rich enough to allow for interesting KDD operations. On the other hand, we must consider the severe limitations of current text processing technology and define rather simple structures that can be extracted from texts fairly automatically and at a reasonable cost. One of the options is to use a text categorization/term extraction paradigm to annotate text articles with meaningful concepts that are organized in a hierarchical structure. This relatively simple annotation is rich enough to provide the basis for a novel KDD framework, enabling data summarization, exploration of interesting patterns, and trend analysis.