Two-Stage Model for Information Filtering

Authors:
Xujuan Zhou;Yuefeng Li;Peter Bruza;Yue Xu;Raymond Y. K. Lau
Affiliations:
-;-;-;-;-
Venue:
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Year:
2008

Citing 14
Cited 1

Information filtering and information retrieval: two sides of the same coin?

Communications of the ACM - Special issue on information filtering
The World-Wide Web: quagmire or gold mine?

Communications of the ACM
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Interactive Internet search: keyword, directory and query reformulation mechanisms compared

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
Feature Engineering for Text Classification

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Mining Ontology for Automatically Acquiring Web User Information Needs

IEEE Transactions on Knowledge and Data Engineering
Bayesian graphical models for adaptive filtering

Bayesian graphical models for adaptive filtering
Identifying comparative sentences in text documents

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Deploying Approaches for Pattern Refinement in Text Mining

ICDM '06 Proceedings of the Sixth International Conference on Data Mining

Concept-Based, Personalized Web Information Gathering: A Survey

KSEM '09 Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This thesis presents a novel two-stage model that integrates the theories and techniques from the fields of information retrieval/filtering (IR/IF)and the fields of machine learning and data mining to provide more precise document filtering and retrieval. The first stage is topic filtering. The topic filtering stage is intended to minimize information mismatch by filtering out the most likely irrelevant information based on term-based profiles. Thus, only a relatively small amount of potentially highly relevant documents remain for document ranking. The second stage of the presented method uses pattern mining approach. The objective of the second stage is to solve the problem of information overload. The most likely relevant documents were assigned higher ranks by exploiting patterns in the pattern taxonomy. The second stage is precision oriented. Since relatively small amount of documents are involved at this stage, computational cost is markedly reduced, at the same time, with significant improved results. The new two-stage information filtering model has been evaluated by extensive experiments. The tests were based on well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely Reuters Corpus Volume 1 (RCV1). The performance of the new model was compared with both of the term-based and data mining-based IF models. The results show that more effective and efficient information access has been achieved by combining the strength of information filtering and data mining method.