A pattern discovery model for effective text mining

Authors:
Luepol Pipanmaekaporn;Yuefeng Li
Affiliations:
School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, Australia;School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, Australia
Venue:
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2012

Citing 21
Cited 0

The effect of adding relevance information in a relevance feedback environment

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Building a filtering test collection for TREC 2002

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic Pattern-Taxonomy Extraction for Web Mining

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Extracting redundancy-aware top-k patterns

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Sequential patterns for text categorization

Intelligent Data Analysis
High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Deploying Approaches for Pattern Refinement in Text Mining

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A concept-based model for enhancing text categorization

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Overview and semantic issues of text mining

ACM SIGMOD Record
A "Bag" or a "Window" of Words for Information Filtering?

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Measuring the incremental information value of documents

Information Sciences: an International Journal
Combination of Evidence-Based Classifiers for Text Categorization

ICTAI '11 Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence
Effective Pattern Discovery for Text Mining

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The quality of extracted features is the key issue to text mining due to the large number of terms, phrases, and noise. Most existing text mining methods are based on term-based approaches which extract terms from a training set for describing relevant information. However, the quality of the extracted terms in text documents may be not high because of lot of noise in text. For many years, some researchers make use of various phrases that have more semantics than single words to improve the relevance, but many experiments do not support the effective use of phrases since they have low frequency of occurrence, and include many redundant and noise phrases. In this paper, we propose a novel pattern discovery approach for text mining. This approach first discovers closed sequential patterns in text documents for identifying the most informative contents of the documents and then utilise the identified contents to extract useful features for text mining. We develop a novel fusion method based on Dempster-Shafer's evidential reasoning which allows to combine the pieces of document to discover the knowledge (features). To evaluate the proposed approach, we adopt the feature extraction method for information filtering (IF). The experimental results conducted on Reuters Corpus Volume 1 and TREC topics confirm that the proposed approach could achieve excellent performance.