The effect of adding relevance information in a relevance feedback environment
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Generating non-redundant association rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Building a filtering test collection for TREC 2002
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic Pattern-Taxonomy Extraction for Web Mining
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Summarizing itemset patterns: a profile-based approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Extracting redundancy-aware top-k patterns
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Sequential patterns for text categorization
Intelligent Data Analysis
High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Deploying Approaches for Pattern Refinement in Text Mining
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A concept-based model for enhancing text categorization
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Overview and semantic issues of text mining
ACM SIGMOD Record
A "Bag" or a "Window" of Words for Information Filtering?
SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Measuring the incremental information value of documents
Information Sciences: an International Journal
Combination of Evidence-Based Classifiers for Text Categorization
ICTAI '11 Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence
Effective Pattern Discovery for Text Mining
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
The quality of extracted features is the key issue to text mining due to the large number of terms, phrases, and noise. Most existing text mining methods are based on term-based approaches which extract terms from a training set for describing relevant information. However, the quality of the extracted terms in text documents may be not high because of lot of noise in text. For many years, some researchers make use of various phrases that have more semantics than single words to improve the relevance, but many experiments do not support the effective use of phrases since they have low frequency of occurrence, and include many redundant and noise phrases. In this paper, we propose a novel pattern discovery approach for text mining. This approach first discovers closed sequential patterns in text documents for identifying the most informative contents of the documents and then utilise the identified contents to extract useful features for text mining. We develop a novel fusion method based on Dempster-Shafer's evidential reasoning which allows to combine the pieces of document to discover the knowledge (features). To evaluate the proposed approach, we adopt the feature extraction method for information filtering (IF). The experimental results conducted on Reuters Corpus Volume 1 and TREC topics confirm that the proposed approach could achieve excellent performance.