Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections

Authors:
Helena Ahonen;Oskari Heinonen;Mika Klemettinen;A. Inkeri Verkamo
Affiliations:
-;-;-;-
Venue:
ADL '98 Proceedings of the Advances in Digital Libraries Conference
Year:
1998

Citing 0
Cited 14

Web mining research: a survey

ACM SIGKDD Explorations Newsletter
The SOMLib Digital Library System

ECDL '99 Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries
LitLinker: capturing connections across the biomedical literature

Proceedings of the 2nd international conference on Knowledge capture
Automatic Pattern-Taxonomy Extraction for Web Mining

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Automated ontology construction for unstructured text documents

Data & Knowledge Engineering
Mining soft-matching rules from textual data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Mining "Hidden phrase" definitions from the web

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
An agile process for the creation of conceptual models from content descriptions

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Mining positive and negative patterns for relevance feature discovery

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Rough sets based reasoning and pattern mining for a two-stage information filtering system

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Pattern mining for a two-stage information filtering system

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A two-stage decision model for information filtering

Decision Support Systems
A multi-level framework for the analysis of sequential data

Data Mining
Sequential pattern mining -- approaches and algorithms

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, texts have been analysed using various information retrieval related methods, such as full-text analysis, and natural language processing. However, only few examples of data mining in text, particularly in full text, are available.In this paper we show that general data mining methods are applicable to text analysis tasks such as descriptive phrase extraction. Moreover, we present a general framework for text mining. The framework follows the general knowledge discovery process, thus containing steps from preprocessing to the utilization of the results. The data mining method that we apply is based on generalized episodes and episode rules.We give concrete examples of how to preprocess texts based on the intended use of the discovered results and we introduce a weighting scheme that helps in pruning out redundant or non-descriptive phrases. We also present results from real-life data experiments.