SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Machine Learning for Information Extraction in Informal Domains
Machine Learning - Special issue on information retrieval
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Building intelligent web applications using lightweight wrappers
Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
A brief survey of web data extraction tools
ACM SIGMOD Record
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
WebOQL: Restructuring Documents, Databases, and Webs
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Semistructured data: the TSIMMIS experience
ADBIS'97 Proceedings of the First East-European conference on Advances in Databases and Information systems
Joint optimization of wrapper generation and template detection
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting text segmentation via progressive classification
Knowledge and Information Systems
Pictor: an interactive system for importing data from a website
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Cooperative CG-Wrappers for Web Content Extraction
ICCS '07 Proceedings of the 15th international conference on Conceptual Structures: Knowledge Architectures for Smart Applications
Can we learn a template-independent wrapper for news article extraction from a single training site?
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient record-level wrapper induction
Proceedings of the 18th ACM conference on Information and knowledge management
Scalable web data extraction for online market intelligence
Proceedings of the VLDB Endowment
Automatic extraction rules generation based on XPath pattern learning
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Unsupervised user-generated content extraction by dependency relationships
WISE'11 Proceedings of the 12th international conference on Web information system engineering
Semantic web enabled information systems: personalized views on web data
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications
Integrating information extraction agents into a tourism recommender system
HAIS'10 Proceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part II
Information extraction for the semantic web
Proceedings of the First international conference on Reasoning Web
Datalog-Related aspects in lixto visual developer
Datalog'10 Proceedings of the First international conference on Datalog Reloaded
DEiXTo: a web data extraction suite
Proceedings of the 6th Balkan Conference in Informatics
Hi-index | 0.00 |
Nowadays several companies use the information available on the Web for a number of purposes. However, since most of this information is only available as HTML documents, several techniques that allow information from the Web to be automatically extracted have recently been defined. In this paper we review the main techniques and tools for extracting information available on the Web, devising a taxonomy of existing systems. In particular we emphasize the advantages and drawbacks of the techniques analyzed from a user point of view.