Finding patterns common to a set of strings (Extended Abstract)

Authors:
Dana Angluin
Affiliations:
-
Venue:
STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
Year:
1979

Citing 0
Cited 29

Polynomial time inference of a subclass of context-free transformations

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Computational aspects of resilient data extraction from semistructured sources (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The expressibility of languages and relations by word equations

Journal of the ACM (JACM)
Consistent Identification in the Limit of Rigid Grammars from Strings Is NP-hard

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Consistent Identification in the Limit of Any of the Classes k -Valued Is NP-hard

LACL '01 Proceedings of the 4th International Conference on Logical Aspects of Computational Linguistics
On Word Equations in One Variable

MFCS '02 Proceedings of the 27th International Symposium on Mathematical Foundations of Computer Science
Characteristic Sets for Unions of Regular Pattern Languages and Compactness

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
A Negative Result on Inductive Inference of Extended Pattern Languages

ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
Refutable/Inductive Learning from Neighbor Examples and Its Application to Decision Trees over Patterns

Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Discovering Characteristic Patterns from Collections of Classical Japanese Poems

DS '98 Proceedings of the First International Conference on Discovery Science
Discovery and Deduction

DS '00 Proceedings of the Third International Conference on Discovery Science
Language Learning with a Neighbor System

DS '00 Proceedings of the Third International Conference on Discovery Science
Satisfiability of Word Equations with Constants is in PSPACE

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Satisfiability of word equations with constants is in PSPACE

Journal of the ACM (JACM)
An efficient algorithm for solving word equations

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Learning of erasing primitive formal systems from positive examples

Theoretical Computer Science - Algorithmic learning theory
De-duping URLs via rewrite rules

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Polynomial Time Algorithms for Learning k-Reversible Languages and Pattern Languages with Correction Queries

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
On Learning Regular Expressions and Patterns Via Membership and Correction Queries

ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
Research in the theory of inductive inference by GDR mathematicians-A survey

Information Sciences: an International Journal
A bibliographical study of grammatical inference

Pattern Recognition
A pattern tree-based approach to learning URL normalization rules

Proceedings of the 19th international conference on World wide web
Inclusion problems for patterns with a bounded number of variables

DLT'10 Proceedings of the 14th international conference on Developments in language theory
Finite automata and unions of regular patterns with bounded constant segments

CIAA'05 Proceedings of the 10th international conference on Implementation and Application of Automata
Exploratory analysis system for semi-structured engineering logs

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
On multi-head automata with restricted nondeterminism

Information Processing Letters
Learning twig and path queries

Proceedings of the 15th International Conference on Database Theory
Inside the class of REGEX languages

DLT'12 Proceedings of the 16th international conference on Developments in Language Theory
A note on the complexity of matching patterns with variables

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We motivate, formalize, and study a computational problem in concrete inductive inference. A “pattern” is defined to be a concatenation of constants and variables, and the language of a pattern is defined to be the set of strings obtained by substituting constant strings for the variables. The problem we consider is, given a set of strings, find a minimal pattern language containing this set. This problem is shown to be effectively solvable in the general case and to lead to correct inference in the limit of the pattern languages. There exists a polynomial time algorithm for it in the restricted case of one-variable patterns. Inference from positive data is re-examined, and a characterization given of when it is possible for a family of recursive languages. Various collateral results about patterns and pattern languages are obtained. Section 1 is an introduction explaining the context of this work and informally describing the problem formulation. Section 2 is definitions. Section 3 is results concerning patterns and pattern languages. Section 4 concerns the abstract question of inference from positive data. Section 5 gives a polynomial time algorithm for finding minimal one-variable pattern languages compatible with a given set of strings. Section 6 contains remarks.