SESQ: a novel system for building domain specific web search engines

  • Authors:
  • Qi Guo;Lizhu Zhou;Hang Guo;Jun Zhang

  • Affiliations:
  • Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China

  • Venue:
  • APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays the Web represents a huge heterogeneous data source. The rapid growth of data volume and the dynamic nature of the Web make it difficult for users to find relevant information for a specific domain. To meet this demand, we have designed and implemented a novel system, called SESQ for building domain specific search engine. Using SESQ, the user first needs to specify the data schema of the domain and gives the seed for the data of the schema; then writes extracting rules to indicate how to get instance data of the schema from relevant web pages. The system will extract the instance data for the schema from the web pages and find new web sites and web pages relevant to the schema by crawling. SESQ provides a highly efficient data storage and index structure for the collected data, and provides an interactive query interface for end users to represent structural query on the data. Besides, the data can be further analyzed by some analytical tools (such as OLAP) .