Logic-based web information extraction

Authors:
Georg Gottlob;Christoph Koch
Affiliations:
Technische Universität Wien, Austria;Technische Universität Wien, Austria
Venue:
ACM SIGMOD Record
Year:
2004

Citing 32
Cited 2

Relational queries computable in polynomial time

Information and Control
Decidability and expressiveness aspects of logic queries

PODS '87 Proceedings of the sixth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Principles of database and knowledge-base systems, Vol. I

Principles of database and knowledge-base systems, Vol. I
LTUR: a simplified linear-time unit resolution algorithm for Horn formulae and computer implementation

Information Processing Letters
Decidable optimization problems for database logic programs

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Graph rewriting: an algebraic and logic approach

Handbook of theoretical computer science (vol. B)
Software agents

Communications of the ACM
Languages, automata, and logic

Handbook of formal languages, vol. 3
A hierarchical approach to wrapper induction

Proceedings of the third annual conference on Autonomous Agents
Managing semistructured data with florid: a deductive object-oriented perspective

Information Systems - Special issue on semistructured data
Building intelligent web applications using lightweight wrappers

Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
Complexity and expressive power of logic programming

ACM Computing Surveys (CSUR)
Expressiveness of structured document query languages based on attribute grammars

Journal of the ACM (JACM)
Monadic datalog and the expressive power of languages for web information extraction

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Fundamentals of Data Warehouses

Fundamentals of Data Warehouses
Query automata over finite trees

Theoretical Computer Science
DEByE - Date extraction by example

Data & Knowledge Engineering
Automata theory for XML researchers

ACM SIGMOD Record
Query evaluation via tree-decompositions

Journal of the ACM (JACM)
A Query Translation Scheme for Rapid Implementation of Wrappers

DOOD '95 Proceedings of the Fourth International Conference on Deductive and Object-Oriented Databases
Monadic Queries over Tree-Structured Data

LICS '02 Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
Numerical document queries

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Query Evaluation on Compressed Trees (Extended Abstract)

LICS '03 Proceedings of the 18th Annual IEEE Symposium on Logic in Computer Science
The complexity of relational query languages (Extended Abstract)

STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Monadic datalog and the expressive power of languages for Web information extraction

Journal of the ACM (JACM)
Conjunctive queries over trees

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient algorithms for processing XPath queries

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient processing of expressive node-selecting queries on XML data in secondary storage: a tree automata-based approach

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Information extraction from web documents based on local unranked tree automaton inference

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Learning (k,l)-contextual tree languages for information extraction from web pages

Machine Learning
Application of logic wrappers to hierarchical data extraction from HTML

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web wrapping proble, i.e., the problem of extracting structured information from HTML documents, is one of great practical importance. The often observed information overload that users of the Web experience witnesses the lack of intelligent and encompassing Web services that provide high-quality collected and value-added inforamtion. The Web wrapping problem has been addressed by a significant amount of research work. Previous work can be classified into two categories, depending on whether the HTML input is regarded as a sequential character string (e.g., [34, 27, 24, 30, 23]) or a pre-parsed document tree (for instance, [35, 25, 22, 29, 3, 2, 26]). The latter category of work thus assumes that systems may make use of an existing HTML parser as a front and.