Template-based wrappers in the TSIMMIS system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Araneus Web-based management system
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Database techniques for the World-Wide Web: a survey
ACM SIGMOD Record
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Information Systems - Special issue on semistructured data
Managing semistructured data with florid: a deductive object-oriented perspective
Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Extracting semi-structured data through examples
Proceedings of the eighth international conference on Information and knowledge management
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Machine Learning for Information Extraction in Informal Domains
Machine Learning - Special issue on information retrieval
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Building intelligent web applications using lightweight wrappers
Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
Bootstrapping for example-based data extraction
Proceedings of the tenth international conference on Information and knowledge management
DEByE - Date extraction by example
Data & Knowledge Engineering
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Object Exchange Across Heterogeneous Information Sources
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
ICDT '97 Proceedings of the 6th International Conference on Database Theory
WebOQL: Restructuring Documents, Databases, and Webs
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Jedi: Extracting and Synthesizing Information from the Web
COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
X-tract: Structure Extraction from Botanical Textual Descriptions
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Semistructured data: the TSIMMIS experience
ADBIS'97 Proceedings of the First East-European conference on Advances in Databases and Information systems
The Debye Environment for Web Data Management
IEEE Internet Computing
A Framework for Generating Attribute Extractors for Web Data Sources
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Toolkits for Generating Wrappers
NODe '02 Revised Papers from the International Conference NetObjectDays on Objects, Components, Architectures, Services, and Applications for a Networked World
The Web-DL environment for building digital libraries from the Web
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An alternative architecture for financial data integration
Communications of the ACM - New architectures for financial services
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Automatic generation of agents for collecting hidden web pages for data extraction
Data & Knowledge Engineering - Special issue: WIDM 2002
Towards building logical views of websites
Data & Knowledge Engineering - Special issue: WIDM 2002
Toward semantic understanding: an approach based on information extraction ontologies
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Testbed for information extraction from deep web
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Personalized Web Views for Multilingual Web Sources
IEEE Internet Computing
Proceedings of the 17th annual ACM symposium on User interface software and technology
OLERA: Semisupervised Web-Data Extraction with Visual Support
IEEE Intelligent Systems
Extracting relational data from HTML repositories
ACM SIGKDD Explorations Newsletter
WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model
IEEE Transactions on Knowledge and Data Engineering
Bootstrapping Semantic Annotation for Content-Rich HTML Documents
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Browsing fatigue in handhelds: semantic bookmarking spells relief
WWW '05 Proceedings of the 14th international conference on World Wide Web
Interactive web-wrapper construction for extracting relational information from web documents
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Automatic wrapper maintenance for semi-structured web sources using results from previous queries
Proceedings of the 2005 ACM symposium on Applied computing
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
Adaptive web information extraction
Communications of the ACM - Two decades of the language-action perspective
Proceedings of the 15th international conference on World Wide Web
Model-directed web transactions under constrained modalities
Proceedings of the 15th international conference on World Wide Web
Interactive wrapper generation with minimal user effort
Proceedings of the 15th international conference on World Wide Web
Proceedings of the 15th international conference on World Wide Web
Documentum ECI self-repairing wrappers: performance analysis
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
From HTML documents to web tables and rules
ICEC '06 Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet
A two-phase rule generation and optimization approach for wrapper generation
ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Automatic extraction of dynamic record sections from search engine result pages
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
MyPortal: robust extraction and aggregation of web content
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Exploiting web browsing histories to identify user needs
Proceedings of the 12th international conference on Intelligent user interfaces
Web wrapper induction: a brief survey
AI Communications
Automatically maintaining wrappers for semi-structured web sources
Data & Knowledge Engineering
Making mashups with marmite: towards end-user programming for the web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Supporting end-users in the creation of dependable web clips
Proceedings of the 16th international conference on World Wide Web
Adaptive record extraction from web pages
Proceedings of the 16th international conference on World Wide Web
Interactive Tuples Extraction from Semi-Structured Data
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Protection Techniques from Information Extraction
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
FLUX-CIM: flexible unsupervised extraction of citation metadata
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Data Extraction From Repositories On The Web: A Semi-Automatic Approach
Journal of Integrated Design & Process Science
Mining templates from search result records of search engines
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint optimization of wrapper generation and template detection
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Model-directed Web transactions under constrained modalities
ACM Transactions on the Web (TWEB)
Automatically maintaining navigation sequences for querying semi-structured web sources
Data & Knowledge Engineering
Adaptive web-page content identification
Proceedings of the 9th annual ACM international workshop on Web information and data management
Discovering geographic locations in web pages using urban addresses
Proceedings of the 4th ACM workshop on Geographical information retrieval
Context-aware wrapping: synchronized data extraction
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
ACM Transactions on the Web (TWEB)
Extracting lists of data records from semi-structured web pages
Data & Knowledge Engineering
A wrapper generation system for PDF documents
Proceedings of the 2008 ACM symposium on Applied computing
Perception-oriented online news extraction
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Pictor: an interactive system for importing data from a website
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting data records in semi-structured web sites based on text token clustering
Integrated Computer-Aided Engineering
ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Cooperative CG-Wrappers for Web Content Extraction
ICCS '07 Proceedings of the 15th international conference on Conceptual Structures: Knowledge Architectures for Smart Applications
WRAPPER INFERENCE FOR AMBIGUOUS WEB PAGES
Applied Artificial Intelligence
Applied Artificial Intelligence
Automated Semantic Analysis of Schematic Data
World Wide Web
Towards a System for Ontology-Based Information Extraction from PDF Documents
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
Extracting geographic features from the Internet to automatically build detailed regional gazetteers
International Journal of Geographical Information Science
Attaching UI enhancements to websites with end users
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Extracting article text from the web with maximum subsequence segmentation
Proceedings of the 18th international conference on World wide web
Extracting data records from the web using tag path clustering
Proceedings of the 18th international conference on World wide web
Can we learn a template-independent wrapper for news article extraction from a single training site?
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Template-independent wrapper for web forums
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A Structured Approach to Data Reverse Engineering of Web Applications
ICWE '9 Proceedings of the 9th International Conference on Web Engineering
Deriving image-text document surrogates to optimize cognition
Proceedings of the 9th ACM symposium on Document engineering
Web document text and images extraction using DOM analysis and natural language processing
Proceedings of the 9th ACM symposium on Document engineering
Site-Wide Wrapper Induction for Life Science Deep Web Databases
DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
Template-independent news extraction based on visual consistency
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Extracting informative images from web news pages via imbalanced classification
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Efficient record-level wrapper induction
Proceedings of the 18th ACM conference on Information and knowledge management
Automatic web data extraction using tree alignment
Proceedings of the 18th ACM conference on Information and knowledge management
A fast and simple method for extracting relevant content from news webpages
Proceedings of the 18th ACM conference on Information and knowledge management
Web news categorization using a cross-media document graph
Proceedings of the ACM International Conference on Image and Video Retrieval
Information extraction for search engines using fast heuristic techniques
Data & Knowledge Engineering
Scalable web data extraction for online market intelligence
Proceedings of the VLDB Endowment
Automated Ontology-Driven Metasearch Generation with Metamorph
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Wrapping of Web Sources with restricted Query Interfaces by Query Tunneling
Electronic Notes in Theoretical Computer Science (ENTCS)
An information extraction approach to reorganizing and summarizing specifications
Information and Software Technology
An adaptive bottom up clustering approach for web news extraction
WOCC'09 Proceedings of the 18th international conference on Wireless and Optical Communications Conference
Visual extraction of information from web pages
Journal of Visual Languages and Computing
Finding and Extracting Data Records from Web Pages
Journal of Signal Processing Systems
Web Semantics: Science, Services and Agents on the World Wide Web
Automatic extraction of clickable structured web contents for name entity queries
Proceedings of the 19th international conference on World wide web
Finding and extracting data records from web pages
EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
Using clustering and edit distance techniques for automatic web data extraction
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Providing personalized mashups within the context of existing web applications
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Enriching OWL with instance recognition semantics for automated semantic annotation
ER'07 Proceedings of the 2007 conference on Advances in conceptual modeling: foundations and applications
Labeling data extracted from the web
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
A method for web information extraction
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
No Code Required: Giving Users Tools to Transform the Web
No Code Required: Giving Users Tools to Transform the Web
Using latent-structure to detect objects on the web
Procceedings of the 13th International Workshop on the Web and Databases
Document structure meets page layout: loopy random fields for web news content extraction
Proceedings of the 10th ACM symposium on Document engineering
Automatic extraction of web data records containing user-generated content
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
AnCaraS: a new webometrics web-spider: G-DEVS-based validation of concepts
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Ranking web sites using domain ontology concepts
Information and Management
On the complexity of regular-grammars with integer attributes
Journal of Computer and System Sciences
On-line web database integration
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
A novel method for bilingual web page acquisition from search engine web records
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Shallow information extraction from medical forum data
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
The OXPath to success in the deep web
Proceedings of the 20th international conference companion on World wide web
Otium: A web based planner for tourism and leisure
Expert Systems with Applications: An International Journal
A Bayesian network modeling approach for cross media analysis
Image Communication
How the minotaur turned into ariadne: ontologies in web data extraction
ICWE'11 Proceedings of the 11th international conference on Web engineering
Little knowledge rules the web: domain-centric result page extraction
RR'11 Proceedings of the 5th international conference on Web reasoning and rule systems
Intelligent self-repairable web wrappers
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
A simhash-based scheme for locating product information from the web
Proceedings of the Second Symposium on Information and Communication Technology
RDFa based annotation of web pages through keyphrases extraction
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
Hybrid method for automated news content extraction from the web
WISE'06 Proceedings of the 7th international conference on Web Information Systems
A logic-based tool for semantic information extraction
JELIA'06 Proceedings of the 10th European conference on Logics in Artificial Intelligence
Integrating semi-structured data into business applications: a web intelligence example
WM'05 Proceedings of the Third Biennial conference on Professional Knowledge Management
Semantic web enabled information systems: personalized views on web data
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
PIES: a web information extraction system using ontology and tag patterns
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Decomposition-Based optimization of reload strategies in the world wide web
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Towards more personalized web: extraction and integration of dynamic content from the web
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Proceedings of the 2005 international conference on Federation over the Web
Information aggregation using the caméléon# web wrapper
EC-Web'05 Proceedings of the 6th international conference on E-Commerce and Web Technologies
ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications
Wrapping PDF documents exploiting uncertain knowledge
CAiSE'06 Proceedings of the 18th international conference on Advanced Information Systems Engineering
CCWrapper: adaptive predefined schema guided web extraction
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
An algorithm of online goods information extraction with two-stage working pattern
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
Automatic wrapper generation for metasearch using ordered tree structured patterns
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Automatic data extraction from data-rich web pages
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Information extraction for the semantic web
Proceedings of the First international conference on Reasoning Web
Ontology creation: extraction of domain knowledge from web documents
ER'05 Proceedings of the 24th international conference on Conceptual Modeling
Logic wrappers and XSLT transformations for tuples extraction from HTML
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Document interrogation: architecture, information extraction and approximate answers
EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Optimization of automatic navigation to hidden web pages by ranking-based browser preloading
DEECS'06 Proceedings of the Second international conference on Data Engineering Issues in E-Commerce and Services
Maintaining web navigation flows for wrappers
DEECS'06 Proceedings of the Second international conference on Data Engineering Issues in E-Commerce and Services
Mining travel resources on the web using l-wrappers
ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
Chapter 6: web data extraction for service creation
Search Computing
The HiLeX system for semantic information extraction
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Datalog-Related aspects in lixto visual developer
Datalog'10 Proceedings of the First international conference on Datalog Reloaded
Hybrid reasoning for web services discovery
RED'10 Proceedings of the Third international conference on Resource Discovery
Data extraction from web pages based on structural-semantic entropy
Proceedings of the 21st international conference companion on World Wide Web
Visual oXPath: robust wrapping by example
Proceedings of the 21st international conference companion on World Wide Web
Extracting multiple news attributes based on visual features
Journal of Intelligent Information Systems
Sift: an end-user tool for gathering web content on the go
Proceedings of the 2012 ACM symposium on Document engineering
Extracting informative textual parts from web pages containing user-generated content
Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Ontology-based access to probabilistic data with OWL QL
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
On text preprocessing for opinion mining outside of laboratory environments
AMT'12 Proceedings of the 8th international conference on Active Media Technology
ELIxIR: Expertise Learning and Identification x Information Retrieval
International Journal of Information Systems and Social Change
A general theory of spatial relations to support a graphical tool for visual information extraction
Journal of Visual Languages and Computing
A reverse engineering approach for automatic annotation of Web pages
Multimedia Tools and Applications
A pattern-based selective recrawling approach for object-level vertical search
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Self-supervised automated wrapper generation for weblog data extraction
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Scalable and noise tolerant web knowledge extraction for search task simplification
Decision Support Systems
Hi-index | 0.00 |
In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction fools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.