Implementation of two-tier link extractor in optimized search engine filtering system

Authors:
S. Madhan Kumar;P. Revathy;K. Vijayalakshmi
Affiliations:
IT Dept., ACE Engg. College, Hyderabad;IT Dept., CVR College of Engg. Hyderabad;JNT University, Hyderabad
Venue:
IMSAA'09 Proceedings of the 3rd IEEE international conference on Internet multimedia services architecture and applications
Year:
2009

Citing 3
Cited 0

Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Toward a Qualitative Search Engine

IEEE Internet Computing
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the present world, Internet has become very familiar to everyone. In Internet, Search Engine is an efficient tool to retrieve documents related to user queries. But the documents retrieved are often large in number and most of them are unrelated to queries. The present day problem is to minimize the unrelated documents. This paper is trying to find a solution by considering a new filtering system to reduce the number of unrelated documents by the search engine. This optimization is performed in various steps. Each step includes several modules. One of these modules is Link Extractor. This research is towards the link extractor's architectural design. After searching the result from the Web this filtering system will display the result to user by Re-ranker, which assigns the value for search engine's retrieved result links. After reranking, the most challenging task is to find out duplicate URL's. The impact of Tier I Link Extractor is it scans every URL's content by extraction technique. After this extraction of links, we can eliminate the duplicate URL's in two ways as URL's are same & anchor-text information is same. Elimination in first case is easy, but the second case i.e., checking every link's content, is too complicated. Tier II Link Extractor implements these two ways. And also it involves in the process of elimination, by document comparison methods with the help of some filters. By performing all these steps, this filtering system can reduce the access time of the users.