Intelligent document filter for the internet

Authors:
Deepani B. Guruge;Russel J. Stonier
Affiliations:
Faculty of Informatics and Communication, Central Queensland University, Rockhampton, QLD, Australia;Faculty of Informatics and Communication, Central Queensland University, Rockhampton, QLD, Australia
Venue:
Data Mining
Year:
2006

Citing 11
Cited 0

A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Genetic algorithms + data structures = evolution programs (2nd, extended ed.)

Genetic algorithms + data structures = evolution programs (2nd, extended ed.)
Using linear algebra for intelligent information retrieval

SIAM Review
A course in fuzzy systems and control

A course in fuzzy systems and control
Learning human-like knowledge by singular value decomposition: a progress report

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A fuzzy set-based accuracy assessment of soft classification

Pattern Recognition Letters
Query-sensitive similarity measures for the calculation of interdocument relationships

Proceedings of the tenth international conference on Information and knowledge management
Evaluating strategies for similarity search on the web

Proceedings of the 11th international conference on World Wide Web
Information Retrieval

Information Retrieval
Evaluating contents-link coupled web page clustering for web search results

Proceedings of the eleventh international conference on Information and knowledge management
A document retrieval system for assisting creative research

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current major search engines on the web retrieve too many documents, of which only a small fraction are relevant to the user query. We propose a new intelligent document filtering algorithm to filter out documents irrelevant to the user query from the output of internet search engines. This algorithm uses output of ‘Google’ search engine as the basic input and processes this input to filter documents most relevant to the query. The clustering algorithm used here is based on the fuzzy c-means with modifications to the membership function formulation and cluster prototype initialisation. It classifies input documents into 3 predefined clusters. Finally, clustered and context-based ranked URLs are presented to the user. The effectiveness of the algorithm has been tested using data provided by the eighth Text REtrieval Conference (TREC-8) [25] and also with on-line data. Experimental results were evaluated by using error matrix method, precision, recall and clustering validity measures.