SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
RoadRunner: automatic data extraction from data-intensive web sites
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
2D Conditional Random Fields for Web information extraction
ICML '05 Proceedings of the 22nd international conference on Machine learning
Adaptive information extraction
ACM Computing Surveys (CSUR)
Interactive wrapper generation with minimal user effort
Proceedings of the 15th international conference on World Wide Web
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Structured Data Extraction from the Web Based on Partial Tree Alignment
IEEE Transactions on Knowledge and Data Engineering
Joint optimization of wrapper generation and template detection
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An unsupervised framework for extracting and normalizing product attributes from multiple web sites
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Open information extraction from the web
Communications of the ACM - Surviving the data deluge
Interactive information extraction with constrained conditional random fields
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
IEEE Transactions on Knowledge and Data Engineering
Build Watson: an overview of DeepQA for the Jeopardy! challenge
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
From one tree to a forest: a unified solution for structured web data extraction
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Unsupervised wrapper induction using linked data
Proceedings of the seventh international conference on Knowledge capture
Hi-index | 0.00 |
Tremendous concrete and comprehensive information is contained in structured data of web pages. Attributes and their corresponding values of entities are precious resources for automatic semantic annotation, knowledge discovery, and information utilization. However, various displaying styles and formats of web pages make it a challenging task to extract them. Based on our observation, despite the lack of information in a single page, different web pages and different web sites illustrating similar entities can provide adequate knowledge for computers to learn. This paper presents a dynamic learning framework to effectively extract structured information from enormous websites in various verticals (e.g., books, cameras, jobs). Different with other existing approaches that are static, require manually labeling samples and can not be flexible to unseen attributes, our approach aims at dynamically, automatically and thoroughly extracting structured data from web pages. Experiments with totally 17,850 web pages in 4 verticals demonstrated the effectiveness of our framework.