Using the structure of Web sites for automatic segmentation of tables
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Language-Independent Set Expansion of Named Entities Using the Web
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
ViDE: A Vision-Based Approach for Deep Web Data Extraction
IEEE Transactions on Knowledge and Data Engineering
Unexpected results in automatic list extraction on the web
ACM SIGKDD Explorations Newsletter
Exploring structure and content on the web: extraction and integration of the semi-structured web
Proceedings of the sixth ACM international conference on Web search and data mining
The parallel path framework for entity discovery on the web
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual structure of the Web page. We present HyLiEn an unsupervised, Hybrid approach for automatic List discovery and Extraction on the Web. It employs general assumptions about the visual rendering of lists, and the structural representation of items contained in them. We show that our method significantly outperforms existing methods.