SESQ: a novel system for building domain specific web search engines

Authors:
Qi Guo;Lizhu Zhou;Hang Guo;Jun Zhang
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China
Venue:
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Year:
2006

Citing 3
Cited 1

Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
An Ontology-Based Method for Querying the Web Data

AINA '03 Proceedings of the 17th International Conference on Advanced Information Networking and Applications
Schema driven and topic specific web crawling

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications

Meta-search based web resource discovery for object-level vertical search

WISE'06 Proceedings of the 7th international conference on Web Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays the Web represents a huge heterogeneous data source. The rapid growth of data volume and the dynamic nature of the Web make it difficult for users to find relevant information for a specific domain. To meet this demand, we have designed and implemented a novel system, called SESQ for building domain specific search engine. Using SESQ, the user first needs to specify the data schema of the domain and gives the seed for the data of the schema; then writes extracting rules to indicate how to get instance data of the schema from relevant web pages. The system will extract the instance data for the schema from the web pages and find new web sites and web pages relevant to the schema by crawling. SESQ provides a highly efficient data storage and index structure for the collected data, and provides an interactive query interface for end users to represent structural query on the data. Besides, the data can be further analyzed by some analytical tools (such as OLAP) .