Finding the smallest binarization of a CFG is NP-hard

Authors:
Carlos Gómez-Rodríguez
Affiliations:
-
Venue:
Journal of Computer and System Sciences
Year:
2014

Citing 22
Cited 0

On parsing coupled-context-free languages

Theoretical Computer Science
Independent parallelism in finite copying parallel rewriting systems

Theoretical Computer Science
Data compression via textual substitution

Journal of the ACM (JACM)
An Improved Context-Free Recognizer

ACM Transactions on Programming Languages and Systems (TOPLAS)
An efficient context-free parsing algorithm

Communications of the ACM
Context-Free Grammars: Covers, Normal Forms, and Parsing

Context-Free Grammars: Covers, Normal Forms, and Parsing
Conjunctive grammars

Journal of Automata, Languages and Combinatorics - Special issue: selected papers of the second internaional workshop on Descriptional Complexity of Automata, Grammars and Related Structures (London, Ontario, Canada, July 27-29, 2000)
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
Application of Lempel--Ziv factorization to the approximation of grammar-based compression

Theoretical Computer Science
An efficient implementation of the head-corner parser

Computational Linguistics
How to cover a grammar

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Compilers: Principles, Techniques, and Tools (2nd Edition)

Compilers: Principles, Techniques, and Tools (2nd Edition)
Efficient parsing of highly ambiguous context-free grammars with bit vectors

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Deterministic left corner parsing

SWAT '70 Proceedings of the 11th Annual Symposium on Switching and Automata Theory (swat 1970)
Better binarization for the CKY parsing

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Efficient parsing for transducer grammars

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Optimal reduction of rule length in linear context-free rewriting systems

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Binarization of synchronous context-free grammars

Computational Linguistics
Efficient parsing of well-nested linear context-free rewriting systems

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Grammar factorization by tree decomposition

Computational Linguistics
A generalized view on parsing and translation

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
The smallest grammar problem

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Grammar binarization is the process and result of transforming a grammar to an equivalent form whose rules contain at most two symbols in their right-hand side. Binarization is used, explicitly or implicity, by a wide range of parsers for context-free grammars and other grammatical formalisms. Non-trivial grammars can be binarized in multiple ways, but in order to optimize the parser's computational cost, it is convenient to choose a binarization that is as small as possible. While several authors have explored heuristics to obtain compact binarizations, none of them guarantee that the resulting grammar has minimum size. However, to our knowledge, no hardness results for this problem have been published. In this article, we address this issue and prove that the problem of finding a minimum binarization of a given context-free grammar is NP-hard, by reduction from vertex cover. We also provide a lower bound on the approximability of this problem.