Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
PCFG models of linguistic tree representations
Computational Linguistics
A test of the leaf-ancestor metric for parse accuracy
Natural Language Engineering
The use of shared forests in tree adjoining grammar parsing
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Parsing idioms in lexicalized TAGs
EACL '89 Proceedings of the fourth conference on European chapter of the Association for Computational Linguistics
Lexicon-grammar and the syntactic analysis of French
ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
Lexicon-grammar: the representation of compound words
COLING '86 Proceedings of the 11th coference on Computational linguistics
Parsing French with Tree Adjoining Grammar: some linguistic accounts
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
A computational model of language performance: Data Oriented Parsing
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Automatic identification of non-compositional phrases
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Factored language models and generalized parallel backoff
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Is it harder to parse Chinese, or the Chinese Treebank?
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Intricacies of Collins' Parsing Model
Computational Linguistics
Lexicalization in crosslinguistic probabilistic parsing: the case of French
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Multilingual deep lexical acquisition for HPSGs via supertagging
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Joint parsing and named entity recognition
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Inducing compact but accurate tree-substitution grammars
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Bayesian learning of a tree substitution grammar
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Improving generative statistical parsing with semi-supervised word clustering
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Cross parser evaluation and tagset variation: a French treebank study
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Task-based evaluation of multiword expressions: a pilot study in statistical machine translation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Can recognising multiword expressions improve shallow parsing?
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Simple, accurate parsing with an all-fragments grammar
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French
SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Better Arabic parsing: baselines, evaluations, and analysis
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Coarse-to-fine natural language processing
Coarse-to-fine natural language processing
Syntax-Based Collocation Extraction
Syntax-Based Collocation Extraction
Inducing Tree-Substitution Grammars
The Journal of Machine Learning Research
Using derivation trees for treebank error detection
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
An n-gram frequency database reference to handle MWE extraction in NLP applications
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Accommodating multiword expressions in an arabic LFG grammar
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Discriminative strategies to integrate multiword expression recognition and parsing
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
Multiword expressions lie at the syntax/semantics interface and have motivated alternative theories of syntax like Construction Grammar. Until now, however, syntactic analysis and multiword expression identification have been modeled separately in natural language processing. We develop two structured prediction models for joint parsing and multiword expression identification. The first is based on context-free grammars and the second uses tree substitution grammars, a formalism that can store larger syntactic fragments. Our experiments show that both models can identify multiword expressions with much higher accuracy than a state-of-the-art system based on word co-occurrence statistics. We experiment with Arabic and French, which both have pervasive multiword expressions. Relative to English, they also have richer morphology, which induces lexical sparsity in finite corpora. To combat this sparsity, we develop a simple factored lexical representation for the context-free parsing model. Morphological analyses are automatically transformed into rich feature tags that are scored jointly with lexical items. This technique, which we call a factored lexicon, improves both standard parsing and multiword expression identification accuracy.