WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
ODE: Ontology-assisted data extraction
ACM Transactions on Database Systems (TODS)
xCrawl: a high-recall crawling method for Web mining
Knowledge and Information Systems - Special Issue:Best Papers from the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2008);Guest Editors: Takashi Washio, Einoshin Suzuki and Kai Ming Ting
Harvesting relational tables from lists on the web
The VLDB Journal — The International Journal on Very Large Data Bases
Data extraction from web pages based on structural-semantic entropy
Proceedings of the 21st international conference companion on World Wide Web
DIADEM: domain-centric, intelligent, automated data extraction methodology
Proceedings of the 21st international conference companion on World Wide Web
A framework for storing and providing aggregated governmental linked open data
EGOVIS'12/EDEM'12 Proceedings of the 2012 Joint international conference on Electronic Government and the Information Systems Perspective and Electronic Democracy, and Proceedings of the 2012 Joint international conference on Advancing Democracy, Government and Governance
OXPath: A language for scalable data extraction, automation, and crawling on the deep web
The VLDB Journal — The International Journal on Very Large Data Bases
Towards web-scale structured web data extraction
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
In this paper we introduce Strigil, a framework for automated data extraction. It represents an easily configurable tool that enables one to retrieve a data from textual or weak-structured documents. The paper contains description of the framework architecture and its important components. Additionally, we propose a scraping language inspired by the XSL transformations designed to extract data from different kinds of documents. Although there are many different approaches focused on various aspects of data scraping, they are usually very specialized to a concrete domain or a data source. We compare these solutions and discuss their advantages and disadvantages. Our scraping language is designed to work with an ontology to map scraped data directly to classes and attributes.