Large Margin Classification Using the Perceptron Algorithm
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
How Effective is Stemming and Decompounding for German Text Retrieval?
Information Retrieval
Computational Linguistics
Mostly-unsupervised statistical segmentation of Japanese Kanji sequences
Natural Language Engineering
Empirical methods for compound splitting
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Extracting paraphrases from a parallel corpus
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Paraphrasing with bilingual parallel corpora
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Using the web as an implicit training set: application to structural ambiguity resolution
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Decompounding query keywords from compounding languages
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A discriminative alignment model for abbreviation recognition
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Using a maximum entropy model to build segmentation lattices for MT
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Semi-supervised lexicon mining from parenthetical expressions in monolingual web pages
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Search engine statistics beyond the n-gram: application to noun compound bracketing
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Unsupervised and knowledge-free learning of compound splits and periphrases
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
German decompounding in a difficult corpus
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Web-scale features for full-scale parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatic acquisition of basic katakana lexicon from a given corpus
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Hi-index | 0.00 |
Word boundaries within noun compounds are not marked by white spaces in a number of languages, unlike in English, and it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds made up of katakana words (i.e., transliterated foreign words) are particularly difficult to split, because katakana words are highly productive and are often out-of-vocabulary. To overcome this difficulty, we propose using monolingual and bilingual paraphrases of katakana noun compounds for identifying word boundaries. Experiments demonstrated that splitting accuracy is substantially improved by extracting such paraphrases from unlabeled textual data, the Web in our case, and then using that information for constructing splitting models.