Principles of database and knowledge-base systems, Vol. I
Principles of database and knowledge-base systems, Vol. I
Information Processing Letters
Logic programming and databases
Logic programming and databases
Graph rewriting: an algebraic and logic approach
Handbook of theoretical computer science (vol. B)
Regular path queries with constraints
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Handbook of formal languages, vol. 1
Handbook of formal languages, vol. 3
Languages, automata, and logic
Handbook of formal languages, vol. 3
Applications of a Web query language
Selected papers from the sixth international conference on World Wide Web
Managing semistructured data with florid: a deductive object-oriented perspective
Information Systems - Special issue on semistructured data
Expressive and efficient pattern languages for tree-structured data (extended abstract)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Building intelligent web applications using lightweight wrappers
Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
Complexity and expressive power of logic programming
ACM Computing Surveys (CSUR)
Query automata over finite trees
Theoretical Computer Science
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
DEByE - Date extraction by example
Data & Knowledge Engineering
A Query Translation Scheme for Rapid Implementation of Wrappers
DOOD '95 Proceedings of the Fourth International Conference on Deductive and Object-Oriented Databases
WebOQL: Restructuring Documents, Databases, and Webs
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Characterizing Regular Languages with Polynomial Densities
MFCS '92 Proceedings of the 17th International Symposium on Mathematical Foundations of Computer Science
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
A Declarative Language for Querying and Restructuring the Web
RIDE '96 Proceedings of the 6th International Workshop on Research Issues in Data Engineering (RIDE '96) Interoperability of Nontraditional Database Systems
Automata theory for XML researchers
ACM SIGMOD Record
Information Extraction in Structured Documents Using Tree Automata Induction
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
CSL '02 Proceedings of the 16th International Workshop and 11th Annual Conference of the EACSL on Computer Science Logic
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A uniform framework for integration of information from the web
Information Systems - Special issue on web data integration
Towards building logical views of websites
Data & Knowledge Engineering - Special issue: WIDM 2002
Logic-based web information extraction
ACM SIGMOD Record
Extracting relational data from HTML repositories
ACM SIGKDD Explorations Newsletter
Data & Knowledge Engineering
Attribute grammars for unranked trees as a query language for structured documents
Journal of Computer and System Sciences
Structural properties of XPath fragments
Theoretical Computer Science - Database theory
Web data extraction based on structural similarity
Knowledge and Information Systems
Information extraction from structured documents using k-testable tree automaton inference
Data & Knowledge Engineering
Comparing XML path expressions
Proceedings of the 2006 ACM symposium on Document engineering
Journal of Computer and System Sciences
Efficient algorithms for processing XPath queries
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The Lixto Systems Applications in Business Intelligence and Semantic Web
ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Logics and Automata for Totally Ordered Trees
RTA '08 Proceedings of the 19th international conference on Rewriting Techniques and Applications
Complete Axiomatizations of MSO, FO(TC1) and FO(LFP1) on Finite Trees
LFCS '09 Proceedings of the 2009 International Symposium on Logical Foundations of Computer Science
Scalable web data extraction for online market intelligence
Proceedings of the VLDB Endowment
Monadic second-order logics with cardinalities
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
XML subtree queries: specification and composition
DBPL'05 Proceedings of the 10th international conference on Database Programming Languages
Semantic web enabled information systems: personalized views on web data
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
A formal comparison of visual web wrapper generators
SOFSEM'06 Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications
WDEE: web data extraction by example
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Information extraction for the semantic web
Proceedings of the First international conference on Reasoning Web
Datalog-Related aspects in lixto visual developer
Datalog'10 Proceedings of the First international conference on Datalog Reloaded
Datalog relaunched: simulation unification and value invention
Datalog'10 Proceedings of the First international conference on Datalog Reloaded
Hi-index | 0.00 |
Research on information extraction from Web pages (wrapping) has seen much activity in recent times (particularly systems implementations), but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping.In this paper, we first study monadic datalog as a wrapping language (over ranked or unranked tree structures). Using previous work by Neven and Schwentick, we show that this simple language is equivalent to full monadic second order logic (MSO) in its ability to specify wrappers. We believe that MSO has the right expressiveness required for Web information extraction and thus propose MSO as a yardstick for evaluating and comparing wrappers.Using the above result, we study the kernel fragment Elog- of the Elog wrapping language used in the Lixto system (a visual wrapper generator). The striking fact here is that Elog- exactly captures MSO, yet is easier to use. Indeed, programs in this language can be entirely visually specified. We also formally compare Elog to other wrapping languages proposed in the literature.