Mining positive and negative patterns for relevance feature discovery

Authors:
Yuefeng Li;Abdulmohsen Algarni;Ning Zhong
Affiliations:
Queensland University of Technology, Brisbane, Australia;Queensland University of Technology, Brisbane, Australia;Maebashi Institute of Technology, Maebashi, Japan
Venue:
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2010

Citing 46
Cited 10

An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of adding relevance information in a relevance feedback environment

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
Data mining: concepts and techniques

Data mining: concepts and techniques
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections

ADL '98 Proceedings of the Advances in Digital Libraries Conference
SLPMiner: An Algorithm for Finding Frequent Sequential Patterns Using Length-Decreasing Support Constraint

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Building a filtering test collection for TREC 2002

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Mining Sequential Patterns Using Graph Search Techniques

COMPSAC '03 Proceedings of the 27th Annual International Conference on Computer Software and Applications
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Automatic Pattern-Taxonomy Extraction for Web Mining

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Multi-labelled classification using maximum entropy method

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining Ontology for Automatically Acquiring Web User Information Needs

IEEE Transactions on Knowledge and Data Engineering
Adapting ranking SVM to document retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Identifying comparative sentences in text documents

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Deploying Approaches for Pattern Refinement in Text Mining

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Ranking with multiple hyperplanes

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Latent concept expansion using markov random fields

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection methods for text classification

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Tracking multiple topics for finding interesting articles

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A concept-based model for enhancing text categorization

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Generating concise association rules

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Query dependent ranking using K-nearest neighbor

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A study of methods for negative relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Selecting good expansion terms for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Deep classification in large-scale text hierarchies

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Fast logistic regression for text categorization with variable-length n-grams

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining multi-faceted overviews of arbitrary topics in a text collection

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Active relevance feedback for difficult queries

Proceedings of the 17th ACM conference on Information and knowledge management
A two-stage text mining model for information filtering

Proceedings of the 17th ACM conference on Information and knowledge management
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A Personalized Ontology Model for Web Information Gathering

IEEE Transactions on Knowledge and Data Engineering

Selected new training documents to update user profile

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A two-stage decision model for information filtering

Decision Support Systems
Efficient subject-oriented evaluating and mining methods for data with schema uncertainty

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Unsupervised multi-label text classification using a world knowledge ontology

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Adopting relevance feature to learn personalized ontologies

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Scoring-Thresholding pattern based text classifier

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Matching Relevance Features with Ontological Concepts

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Using Patterns Co-occurrence Matrix for Cleaning Closed Sequential Patterns for Text Mining

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Mapping semantic knowledge for unsupervised text categorisation

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
A pattern based two-stage text classifier

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.