Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
An Ontology-Based Method for Querying the Web Data
AINA '03 Proceedings of the 17th International Conference on Advanced Information Networking and Applications
Schema driven and topic specific web crawling
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Meta-search based web resource discovery for object-level vertical search
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Hi-index | 0.00 |
Nowadays the Web represents a huge heterogeneous data source. The rapid growth of data volume and the dynamic nature of the Web make it difficult for users to find relevant information for a specific domain. To meet this demand, we have designed and implemented a novel system, called SESQ for building domain specific search engine. Using SESQ, the user first needs to specify the data schema of the domain and gives the seed for the data of the schema; then writes extracting rules to indicate how to get instance data of the schema from relevant web pages. The system will extract the instance data for the schema from the web pages and find new web sites and web pages relevant to the schema by crawling. SESQ provides a highly efficient data storage and index structure for the collected data, and provides an interactive query interface for end users to represent structural query on the data. Besides, the data can be further analyzed by some analytical tools (such as OLAP) .