Extended finite state models of language
Extended finite state models of language
Finite-State Language Processing
Finite-State Language Processing
Information Extraction: Techniques and Challenges
SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
FACILE: Classifying Texts Integrating Pattern Matching and Information Extraction
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Robust processing of real-world natural-language texts
ANLC '92 Proceedings of the third conference on Applied natural language processing
Partial parsing via finite-state cascades
Natural Language Engineering
The NYU system for MUC-6 or where's the syntax?
MUC6 '95 Proceedings of the 6th conference on Message understanding
Recognizing referential links: an information extraction perspective
ANARESOLUTION '97 Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts
Hi-index | 0.00 |
This paper proposes a robust approach to parsing suitable for Information Extraction (IE) from texts using finite-state cascades. The approach is characterized by the construction of an approximation of the full parse tree that captures all the information relevant for IE purposes, leaving the other relations underspecified. Sequences of cascades of finite-state rules deterministically analyze the text, building unambiguous structures. Initially basic chunks are analyzed; then clauses are recognized and nested; finally modifier attachment is performed and the global parse tree is built. The parsing approach allows robust, effective and efficient analysis of real world texts. The grammar organization simplifies changes, insertion of new rules and integration of domain-oriented rules. The approach has been tested for Italian, English, and Russian. A parser based on such an approach has been implemented as part of Pinocchio, an environment for developing and running IE applications.