American Mathematical Monthly
Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
A test of the leaf-ancestor metric for parse accuracy
Natural Language Engineering
The use of shared forests in tree adjoining grammar parsing
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Parsing idioms in lexicalized TAGs
EACL '89 Proceedings of the fourth conference on European chapter of the Association for Computational Linguistics
Lexicon-grammar: the representation of compound words
COLING '86 Proceedings of the 11th coference on Computational linguistics
Parsing French with Tree Adjoining Grammar: some linguistic accounts
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
A computational model of language performance: Data Oriented Parsing
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Automatic identification of non-compositional phrases
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Lexicalization in crosslinguistic probabilistic parsing: the case of French
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Inducing compact but accurate tree-substitution grammars
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Bayesian learning of a tree substitution grammar
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Improving generative statistical parsing with semi-supervised word clustering
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Cross parser evaluation and tagset variation: a French treebank study
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Task-based evaluation of multiword expressions: a pilot study in statistical machine translation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Simple, accurate parsing with an all-fragments grammar
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
Syntax-Based Collocation Extraction
Syntax-Based Collocation Extraction
Inducing Tree-Substitution Grammars
The Journal of Machine Learning Research
Discriminative strategies to integrate multiword expression recognition and parsing
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Parsing models for identifying multiword expressions
Computational Linguistics
Machine learning for high-quality tokenization replicating variable tokenization schemes
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
Improving function word alignment with frequency and syntactic information
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Multiword expressions (MWE), a known nuisance for both linguistics and NLP, blur the lines between syntax and semantics. Previous work on MWE identification has relied primarily on surface statistics, which perform poorly for longer MWEs and cannot model discontinuous expressions. To address these problems, we show that even the simplest parsing models can effectively identify MWEs of arbitrary length, and that Tree Substitution Grammars achieve the best results. Our experiments show a 36.4% F1 absolute improvement for French over an n-gram surface statistics baseline, currently the predominant method for MWE identification. Our models are useful for several NLP tasks in which MWE pre-grouping has improved accuracy.