IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
Wrapping-oriented classification of web pages
Proceedings of the 2002 ACM symposium on Applied computing
World Wide Web
ICDT '99 Proceedings of the 7th International Conference on Database Theory
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Schema-guided wrapper maintenance for web-data extraction
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Canonicalization of graph database records using similarity measures
Proceedings of the 2nd international conference on Ubiquitous information management and communication
Substructure similarity measurement in chinese recipes
Proceedings of the 17th international conference on World Wide Web
Personalized resource search by tag-based user profile and resource profile
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Exploring folksonomy and cooking procedures to boost cooking recipe recommendation
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Recipe sub-goals and graphs: an evaluation by cooks
Proceedings of the ACM multimedia 2012 workshop on Multimedia for cooking and eating activities
Hi-index | 0.01 |
WWW has posed itself as the largest data repository ever available in the history of humankind. Utilizing the Internet as a data source seems to be natural and many efforts have been made. In this paper we focus on establishing a robust system to collect structured recipe data from the Web incrementally, which, as we believe, is a critical step towards practical, continuous, reliable web data extraction systems and therefore utilizing WWW as data sources for various database applications. The reasons for advocating such an incremental approach are two-fold: (1) it is impractical to crawl all the recipe pages from relevant web sites as the Web is highly dynamic; (2) it is almost impossible to induce a general wrapper for future extraction from the initial batch of recipe web pages. In this paper, we describe such a system called RecipeCrawler which targets at incrementally collecting recipe data from WWW. General issues in establishing an incremental data extraction system are considered and techniques are applied to recipe data collection from the Web. Our RecipeCrawler is actually used as the backend of a fully-fledged multimedia recipe database system being developed jointly by City University of Hong Kong and Renmin University of China.