Learning regular expressions from noisy sequences

Authors:
Ugo Galassi;Attilio Giordana
Affiliations:
Dipartimento di Informatica, Università Amedeo Avogadro, Alessandria, Italy;Dipartimento di Informatica, Università Amedeo Avogadro, Alessandria, Italy
Venue:
SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Year:
2005

Citing 12
Cited 3

Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Automata from Ordered Examples

Machine Learning - Connectionist approaches to language learning
Distributed Representations, Simple Recurrent Networks, And Grammatical Structure

Machine Learning - Connectionist approaches to language learning
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
The Hierarchical Hidden Markov Model: Analysis and Applications

Machine Learning
Learning Regular Languages from Simple Positive Examples

Machine Learning
Queries and Concept Learning

Machine Learning
Queries and Concept Learning

Machine Learning
A Polynominal Time Incremental Algorithm for Learning DFA

ICGI '98 Proceedings of the 4th International Colloquium on Grammatical Inference
Learning DFA from Simple Examples

ALT '97 Proceedings of the 8th International Conference on Algorithmic Learning Theory
Formal languages and their relation to automata

Formal languages and their relation to automata
Hierarchical hidden Markov models for information extraction

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Regular expression learning for information extraction

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Efficient schema extraction from a large collection of XML documents

Proceedings of the 49th Annual Southeast Regional Conference
A methodological contribution to music sequences analysis

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The presence of long gaps dramatically increases the diffculty of detecting and characterizing complex events hidden in long sequences. In order to cope with this problem, a learning algorithm based on an abstraction mechanism is proposed: it can infer the general model of complex events from a set of learning sequences. Events are described by means of regular expressions, and the abstraction mechanism is based on the substitution property of regular languages. The induction algorithm proceeds bottom-up, progressively coarsening the sequence granularity, letting correlations between subsequences, separated by long gaps, naturally emerge. Two abstraction operators are defined. The first one detects, and abstracts into non-terminal symbols, regular expressions not containing iterative constructs. The second one detects and abstracts iterated subsequences. By interleaving the two operators, regular expressions in general form may be inferred. Both operators are based on string alignment algorithms taken from bio-informatics. A restricted form of the algorithm has already been outlined in previous papers, where the emphasis was on applications. Here, the algorithm, in an extended version, is described and analyzed into details.