Architecture of a grid-enabled Web search engine

Authors:
B. Barla Cambazoglu;Evren Karaca;Tayfun Kucukyilmaz;Ata Turk;Cevdet Aykanat
Affiliations:
Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey;Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey;Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey;Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey;Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
Venue:
Information Processing and Management: an International Journal
Year:
2007

Citing 36
Cited 2

Implementations of partial document ranking using inverted files

Information Processing and Management: an International Journal
Incremental updates of inverted lists for text document retrieval

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Memory efficient ranking

Information Processing and Management: an International Journal - Special issue: data compression
Query evaluation: strategies and optimizations

Information Processing and Management: an International Journal
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Query performance for tightly coupled distributed digital libraries

Proceedings of the third ACM conference on Digital libraries
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Relevance ranking for one to three term queries

Information Processing and Management: an International Journal
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Building a distributed full-text index for the web

ACM Transactions on Information Systems (TOIS)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Parallel crawlers

Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval

Modern Information Retrieval
Web classification using support vector machine

Proceedings of the 4th international workshop on Web information and data management
Mercator: A scalable, extensible Web crawler

World Wide Web
Document Ranking and the Vector-Space Model

IEEE Software
Automatic Text Categorization and Its Application to Text Retrieval

IEEE Transactions on Knowledge and Data Engineering
The Evolution of the Web and Implications for an Incremental Crawler

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An Efficient Indexing Technique for Full Text Databases

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Design and Implementation of a Distributed Crawler and Filtering Processor

NGITS '02 Proceedings of the 5th International Workshop on Next Generation Information Technologies and Systems
Collaborative Web Crawling: Information Gathering/Processing over Internet

HICSS '99 Proceedings of the Thirty-second Annual Hawaii International Conference on System Sciences-Volume 5 - Volume 5
Design and Implementation of a High-Performance Distributed Web Crawler

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The Grid 2: Blueprint for a New Computing Infrastructure

The Grid 2: Blueprint for a New Computing Infrastructure
Web page classification without the web page

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Efficiency and effectiveness of query processing in cluster-based retrieval

Information Systems
Exploiting Interclass Rules for Focused Crawling

IEEE Intelligent Systems
Crawling a country: better strategies than breadth-first for web page ordering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
Improving collection selection with overlap awareness in P2P search engines

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Performance of query processing implementations in ranking-based text retrieval systems using inverted indices

Information Processing and Management: an International Journal
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

An automatic approach to construct domain-specific web portals

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
On the feasibility of geographically distributed web crawling

Proceedings of the 3rd international conference on Scalable information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search Engine for South-East Europe (SE4SEE) is a socio-cultural search engine running on the grid infrastructure. It offers a personalized, on-demand, country-specific, category-based Web search facility. The main goal of SE4SEE is to attack the page freshness problem by performing the search on the original pages residing on the Web, rather than on the previously fetched copies as done in the traditional search engines. SE4SEE also aims to obtain high download rates in Web crawling by making use of the geographically distributed nature of the grid. In this work, we present the architectural design issues and implementation details of this search engine. We conduct various experiments to illustrate performance results obtained on a grid infrastructure and justify the use of the search strategy employed in SE4SEE.