Intelligent document filter for the internet

  • Authors:
  • Deepani B. Guruge;Russel J. Stonier

  • Affiliations:
  • Faculty of Informatics and Communication, Central Queensland University, Rockhampton, QLD, Australia;Faculty of Informatics and Communication, Central Queensland University, Rockhampton, QLD, Australia

  • Venue:
  • Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current major search engines on the web retrieve too many documents, of which only a small fraction are relevant to the user query. We propose a new intelligent document filtering algorithm to filter out documents irrelevant to the user query from the output of internet search engines. This algorithm uses output of ‘Google’ search engine as the basic input and processes this input to filter documents most relevant to the query. The clustering algorithm used here is based on the fuzzy c-means with modifications to the membership function formulation and cluster prototype initialisation. It classifies input documents into 3 predefined clusters. Finally, clustered and context-based ranked URLs are presented to the user. The effectiveness of the algorithm has been tested using data provided by the eighth Text REtrieval Conference (TREC-8) [25] and also with on-line data. Experimental results were evaluated by using error matrix method, precision, recall and clustering validity measures.