Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Wrapper induction for information extraction
Wrapper induction for information extraction
Testbed for information extraction from deep web
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
ViDE: A Vision-Based Approach for Deep Web Data Extraction
IEEE Transactions on Knowledge and Data Engineering
Automatic wrappers for large scale web extraction
Proceedings of the VLDB Endowment
Little knowledge rules the web: domain-centric result page extraction
RR'11 Proceedings of the 5th international conference on Web reasoning and rule systems
Automatic Extraction of Structured Web Data with Domain Knowledge
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Hi-index | 0.00 |
Web databases are now pervasive. Query result pages are dynamically generated from these databases in response to user-submitted queries. A query result page contains a number of data records, each of which consists of data items and their labels. In this paper, we focus on the data alignment problem, in which individual data items and labels from different data records on a query page are aligned into separate columns, each representing a group of semantically similar data items or labels from each of these data records. We present a new approach to the data alignment problem, in which learning classifiers are trained using supervised learning to align data items and labels. Previous approaches to this problem have relied on heuristics and manually-crafted rules, which are difficult to be adapted to new page layouts and designs. In contrast we are motivated to develop learning classifiers which can be easily adapted. We have implemented the proposed learning classifier-based approach in a software prototype, rAligner, and our experimental results have shown that the approach is highly effective.