A relational algebra for complex objects based on partial information
MFDBS 91 Proceedings of the 3rd symposium on Mathematical fundamentals of database and knowledge base systems
Passage-level evidence in document retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A query language and optimization techniques for unstructured data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Template-based wrappers in the TSIMMIS system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A scalable comparison-shopping agent for the World-Wide Web
AGENTS '97 Proceedings of the first international conference on Autonomous agents
Wrapper generation for semi-structured Internet sources
ACM SIGMOD Record
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Ontology-based extraction and structuring of information from data-rich unstructured documents
Proceedings of the seventh international conference on Information and knowledge management
Database techniques for the World-Wide Web: a survey
ACM SIGMOD Record
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Extracting semi-structured data through examples
Proceedings of the eighth international conference on Information and knowledge management
Data on the Web: from relations to semistructured data and XML
Data on the Web: from relations to semistructured data and XML
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Machine Learning for Information Extraction in Informal Domains
Machine Learning - Special issue on information retrieval
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
CONVERT: a high level translation definition language for data conversion
Communications of the ACM
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Modern Information Retrieval
Form operation by example: a language for office information processing
SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Object Exchange Across Heterogeneous Information Sources
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Wrapper Generation for Web Accessible Data Sources
COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
A Conceptual-Modeling Approach to Extracting Data from the Web
ER '98 Proceedings of the 17th International Conference on Conceptual Modeling
Top-Down Extraction of Semi-Structured Data
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Wrapper induction for information extraction
Wrapper induction for information extraction
Semistructured data: the TSIMMIS experience
ADBIS'97 Proceedings of the First East-European conference on Advances in Databases and Information systems
Monadic datalog and the expressive power of languages for web information extraction
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A brief survey of web data extraction tools
ACM SIGMOD Record
Web-DL: an experience in building digital libraries from the web
Proceedings of the eleventh international conference on Information and knowledge management
Collecting hidden weeb pages for data extraction
Proceedings of the 4th international workshop on Web information and data management
The Debye Environment for Web Data Management
IEEE Internet Computing
Using Nested Tables for Representing and Querying Semistructured Web Data
CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
A Framework for Generating Attribute Extractors for Web Data Sources
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Representing and Querying Semistructured Web Data Using Nested Tables with Structural Variants
ER '02 Proceedings of the 21st International Conference on Conceptual Modeling
The Web-DL environment for building digital libraries from the Web
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Monadic datalog and the expressive power of languages for Web information extraction
Journal of the ACM (JACM)
Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries
ACM Transactions on Information Systems (TOIS)
Automatic generation of agents for collecting hidden web pages for data extraction
Data & Knowledge Engineering - Special issue: WIDM 2002
Towards building logical views of websites
Data & Knowledge Engineering - Special issue: WIDM 2002
Personalized Web Views for Multilingual Web Sources
IEEE Internet Computing
Logic-based web information extraction
ACM SIGMOD Record
A Bayesian network approach to searching Web databases through keyword-based queries
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
The Lixto data extraction project: back and forth between theory and practice
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
L-tree match: a new data extraction model and algorithm for huge text stream with noises
Journal of Computer Science and Technology
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
FLUX-CIM: flexible unsupervised extraction of citation metadata
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Automated data extraction from the web with conditional models
International Journal of Business Intelligence and Data Mining
WRAPPER INFERENCE FOR AMBIGUOUS WEB PAGES
Applied Artificial Intelligence
Categorisation of web documents using extraction ontologies
International Journal of Metadata, Semantics and Ontologies
Automatic wrapper generation using tree matching and partial tree alignment
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
On-the-Fly Integration and Ad Hoc Querying of Life Sciences Databases Using LifeDB
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Information extraction for search engines using fast heuristic techniques
Data & Knowledge Engineering
FastWrap: an efficient wrapper for tabular data extraction from the web
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Visual extraction of information from web pages
Journal of Visual Languages and Computing
Proposing of modular system for web information extraction
CompSysTech '09 Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
WMS-extracting multiple sections data records from search engine results pages
Proceedings of the 2010 ACM Symposium on Applied Computing
An effective method supporting data extraction and schema recognition on deep web
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Web news extraction based on path pattern mining
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Transactions on large-scale data- and knowledge-centered systems III
Developer-friendly annotation-based HTML-to-XML transformation technology
Proceedings of the 11th ACM symposium on Document engineering
Towards a unified solution: data record region detection and segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
A simhash-based scheme for locating product information from the web
Proceedings of the Second Symposium on Information and Communication Technology
A formal comparison of visual web wrapper generators
SOFSEM'06 Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Logic wrappers and XSLT transformations for tuples extraction from HTML
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
DART: a data acquisition and repairing tool
EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Chapter 6: web data extraction for service creation
Search Computing
Integrated visualization framework for relational databases and web resources
IHI'04 Proceedings of the 2004 international conference on Intuitive Human Interfaces for Organizing and Accessing Intellectual Assets
The HiLeX system for semantic information extraction
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
AMBER: turning annotations into knowledge
Proceedings of the 21st international conference companion on World Wide Web
A framework for populating ontological models from semi-structured web documents
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System
International Journal of Data Warehousing and Mining
A general theory of spatial relations to support a graphical tool for visual information extraction
Journal of Visual Languages and Computing
DEiXTo: a web data extraction suite
Proceedings of the 6th Balkan Conference in Informatics
Web news extraction via path ratios
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
In this paper we present DEByE(Data Extraction By Example), an approach to extracting data from Web sources, based on a small set of examples specified by the user. The novelty is in the fact that the user specifies examples according to a structure of his liking and that this structure is described at example specification time. For the specification of the examples, the user interacts with a tool we developed which adopts nested tables as its visual paradigm. Nested tables are simple, intuitive, and allow shielding the user from technical details (such as HTML tags, formatting operators, and learning automata) related to the extraction problem. The examples provided by the user are then used to generate patterns which allow extracting data from new documents. For the extraction, DEByE adopts a new bottom-up procedure we proposed which is very effective with various Web sources, as demonstrated by our experiments.