A graphical query language supporting recursion
SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Algorithms for finding patterns in strings
Handbook of theoretical computer science (vol. A)
Query containment for conjunctive queries with regular expressions
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Regular sequence operations and their use in database queries
Journal of Computer and System Sciences
GraphLog: a visual formalism for real life recursion
PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Consistent query answers in inconsistent databases
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sequences, datalog, transducers
Journal of Computer and System Sciences - Fourteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
Reasoning about strings in databases
Journal of Computer and System Sciences
Maintaining knowledge about temporal intervals
Communications of the ACM
Expressiveness of structured document query languages based on attribute grammars
Journal of the ACM (JACM)
An Introduction to Formal Languages and Automata
An Introduction to Formal Languages and Automata
Query automata over finite trees
Theoretical Computer Science
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimization Properties for Classes of Conjunctive Regular Path Queries
DBPL '01 Revised Papers from the 8th International Workshop on Database Programming Languages
View-Based Query Processing and Constraint Satisfaction
LICS '00 Proceedings of the 15th Annual IEEE Symposium on Logic in Computer Science
Definable relations and first-order query languages over strings
Journal of the ACM (JACM)
Toward general-purpose learning for information extraction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Message Understanding Conference-6: a brief history
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
The common pattern specification language
TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Mastering Regular Expressions
SystemT: a system for declarative information extraction
ACM SIGMOD Record
On Extended Regular Expressions
LATA '09 Proceedings of the 3rd International Conference on Language and Automata Theory and Applications
On the intersection of regex languages with regular languages
Theoretical Computer Science
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
CRYSTAL inducing a conceptual dictionary
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
On relations defined by generalized finite automata
IBM Journal of Research and Development
SystemT: an algebraic approach to declarative information extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Automatically constructing a dictionary for information extraction tasks
AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Automatic rule refinement for information extraction
Proceedings of the VLDB Endowment
Variable automata over infinite alphabets
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Prioritized repairing and consistent query answering in relational databases
Annals of Mathematics and Artificial Intelligence
Graph Logics with Rational Relations and the Generalized Intersection Problem
LICS '12 Proceedings of the 2012 27th Annual IEEE/ACM Symposium on Logic in Computer Science
Parameterized regular expressions and their languages
Theoretical Computer Science
Next generation data analytics at IBM research
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
An intrinsic part of information extraction is the creation and manipulation of relations extracted from text. In this paper, we develop a foundational framework where the central construct is what we call a spanner. A spanner maps an input string into relations over the spans (intervals specified by bounding indices) of the string. The focus of this paper is on the representation of spanners. Conceptually, there are two kinds of such representations. Spanners defined in a primitive representation extract relations directly from the input string; those defined in an algebra apply algebraic operations to the primitively represented spanners. This framework is driven by SystemT, an IBM commercial product for text analysis, where the primitive representation is that of regular expressions with capture variables. We define additional types of primitive spanner representations by means of two kinds of automata that assign spans to variables. We prove that the first kind has the same expressive power as regular expressions with capture variables; the second kind expresses precisely the algebra of the regular spanners---the closure of the first kind under standard relational operators. The core spanners extend the regular ones by string-equality selection (an extension used in SystemT). We give some fundamental results on the expressiveness of regular and core spanners. As an example, we prove that regular spanners are closed under difference (and complement), but core spanners are not. Finally, we establish connections with related notions in the literature.