SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
WebL - a programming language for the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A New Query Processing Scheme in a Web Data Engine
DNIS '02 Proceedings of the Second International Workshop on Databases in Networked Information Systems
WebOQL: Restructuring Documents, Databases, and Webs
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
Focused Crawls, Tunneling, and Digital Libraries
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web
ER '99 Proceedings of the Workshops on Evolution and Change in Data Management, Reverse Engineering in Information Systems, and the World Wide Web and Conceptual Modeling
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
SESQ: A Model-Driven Method for Building Object Level Vertical Search Engines
ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Fragmenting Steiner tree browsers based on Ajax
Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
SESQ: a novel system for building domain specific web search engines
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Hi-index | 0.00 |
We propose a new approach to discover and extract topic-specific hypertext resources from the WWW. The method, called schema driven and topical crawling, allows a user to define schema and extracting rules for a specific domain of interests. It supports automatically search and extract schema-relevant web pages from the web. Different from common approaches that surf solely on web pages, our approach supports crawler to surf on a virtual network composed by concept instances and relationships. To achieve such a goal, we design an architecture that integrates several techniques including web extractor, meta-search engine and query expansion, and provide a toolkit to support it.