Information filtering and information retrieval: two sides of the same coin?
Communications of the ACM - Special issue on information filtering
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Modern Information Retrieval
Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections
ADL '98 Proceedings of the Advances in Digital Libraries Conference
Automatic Pattern-Taxonomy Extraction for Web Mining
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Mining Ontology for Automatically Acquiring Web User Information Needs
IEEE Transactions on Knowledge and Data Engineering
Identifying comparative sentences in text documents
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Deploying Approaches for Pattern Refinement in Text Mining
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Ranking with multiple hyperplanes
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Using Information Filtering in Web Data Mining Process
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
A two-stage text mining model for information filtering
Proceedings of the 17th ACM conference on Information and knowledge management
Hi-index | 0.01 |
As information available over computer networks is growing exponentially, searching for useful information becomes increasingly more difficult. Accordingly, developing an effective information filtering mechanism is becoming very important to alleviate the problem of information overload. Information filtering systems often employ user profiles to represent users' information needs so as to determine the relevance of documents from an incoming data stream. This paper presents a novel two-stage information filtering model which combines the merits of termbased and pattern-based approaches to effectively filter sheer volume of information. In particular, the first filtering stage is supported by a novel rough analysis model which efficiently removes a large number of irrelevant documents, thereby addressing the overload problem. The second filtering stage is empowered by a semantically rich pattern taxonomy mining model which effectively fetches incoming documents according to the specific information needs of a user, thereby addressing the mismatch problem. The experimental results based on the RCV1 corpus show that the proposed two-stage filtering model significantly outperforms both the term-based and pattern-based information filtering models.