SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
DEByE - Date extraction by example
Data & Knowledge Engineering
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
SOFIE: a self-organizing framework for information extraction
Proceedings of the 18th international conference on World wide web
FiVaTech: Page-Level Web Data Extraction from Template Pages
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
The Web is the largest repository of information that has ever existed. This information is presented in a human friendly format using HTML, which complicates the consumption of this information by automatic processes. Solutions to this problem are the Semantic Web and Web Services, but the lack of such services in the majority of web sites has increased the interest on information extraction, which allow extracting and structuring information from web documents in ontological models. Despite the high number of proposals on information extraction, there does not exist a universally applicable information extractor. As a consequence, when populating an ontology model automatically from a web site, it is not unusual to need more than one information extractor. We propose a framework that allows the development, training, and the application of information extractors on semi-structured web documents to produce semantic data. We have developed a version of the framework and verified it by means of experiments on 15 web sites. Experimental results are very promising.