Implementation of two-tier link extractor in optimized search engine filtering system

  • Authors:
  • S. Madhan Kumar;P. Revathy;K. Vijayalakshmi

  • Affiliations:
  • IT Dept., ACE Engg. College, Hyderabad;IT Dept., CVR College of Engg. Hyderabad;JNT University, Hyderabad

  • Venue:
  • IMSAA'09 Proceedings of the 3rd IEEE international conference on Internet multimedia services architecture and applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the present world, Internet has become very familiar to everyone. In Internet, Search Engine is an efficient tool to retrieve documents related to user queries. But the documents retrieved are often large in number and most of them are unrelated to queries. The present day problem is to minimize the unrelated documents. This paper is trying to find a solution by considering a new filtering system to reduce the number of unrelated documents by the search engine. This optimization is performed in various steps. Each step includes several modules. One of these modules is Link Extractor. This research is towards the link extractor's architectural design. After searching the result from the Web this filtering system will display the result to user by Re-ranker, which assigns the value for search engine's retrieved result links. After reranking, the most challenging task is to find out duplicate URL's. The impact of Tier I Link Extractor is it scans every URL's content by extraction technique. After this extraction of links, we can eliminate the duplicate URL's in two ways as URL's are same & anchor-text information is same. Elimination in first case is easy, but the second case i.e., checking every link's content, is too complicated. Tier II Link Extractor implements these two ways. And also it involves in the process of elimination, by document comparison methods with the help of some filters. By performing all these steps, this filtering system can reduce the access time of the users.