RecipeCrawler: collecting recipe data from WWW incrementally

Authors:
Yu Li;Xiaofeng Meng;Liping Wang;Qing Li
Affiliations:
School of Information, Renmin Univ. of China, China;School of Information, Renmin Univ. of China, China;Computer Science Dept., City Univ. of Hong Kong, HKSAR, China;Computer Science Dept., City Univ. of Hong Kong, HKSAR, China
Venue:
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Year:
2006

Citing 12
Cited 5

IEPAD: information extraction based on pattern discovery

Proceedings of the 10th international conference on World Wide Web
Wrapping-oriented classification of web pages

Proceedings of the 2002 ACM symposium on Applied computing
Wrapper verification

World Wide Web
In Search of the Lost Schema

ICDT '99 Proceedings of the 7th International Conference on Database Theory
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Data extraction and label assignment for web databases

WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Schema-guided wrapper maintenance for web-data extraction

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Mining data records in Web pages

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic web news extraction using tree edit distance

Proceedings of the 13th international conference on World Wide Web
Fully automatic wrapper generation for search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment

WWW '05 Proceedings of the 14th international conference on World Wide Web

Canonicalization of graph database records using similarity measures

Proceedings of the 2nd international conference on Ubiquitous information management and communication
Substructure similarity measurement in chinese recipes

Proceedings of the 17th international conference on World Wide Web
Personalized resource search by tag-based user profile and resource profile

WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Exploring folksonomy and cooking procedures to boost cooking recipe recommendation

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Recipe sub-goals and graphs: an evaluation by cooks

Proceedings of the ACM multimedia 2012 workshop on Multimedia for cooking and eating activities

Quantified Score

Hi-index	0.01

Visualization

Abstract

WWW has posed itself as the largest data repository ever available in the history of humankind. Utilizing the Internet as a data source seems to be natural and many efforts have been made. In this paper we focus on establishing a robust system to collect structured recipe data from the Web incrementally, which, as we believe, is a critical step towards practical, continuous, reliable web data extraction systems and therefore utilizing WWW as data sources for various database applications. The reasons for advocating such an incremental approach are two-fold: (1) it is impractical to crawl all the recipe pages from relevant web sites as the Web is highly dynamic; (2) it is almost impossible to induce a general wrapper for future extraction from the initial batch of recipe web pages. In this paper, we describe such a system called RecipeCrawler which targets at incrementally collecting recipe data from WWW. General issues in establishing an incremental data extraction system are considered and techniques are applied to recipe data collection from the Web. Our RecipeCrawler is actually used as the backend of a fully-fledged multimedia recipe database system being developed jointly by City University of Hong Kong and Renmin University of China.