Pitch accent in context: predicting intonational prominence from text
Artificial Intelligence - Special volume on natural language processing
Chinese text segmentation for text retrieval: achievements and problems
Journal of the American Society for Information Science
Regular models of phonological rule systems
Computational Linguistics - Special issue on computational phonology
Minimization algorithms for sequential transducers
Theoretical Computer Science
Improving Chinese tokenization with linguistic filters on statistical lexical acquisition
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Constituent-based morphological parsing: a new approach to the problem of word-recognition.
ACL '87 Proceedings of the 25th annual meeting on Association for Computational Linguistics
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
A finite-state morphological processor for Spanish
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
A stochastic Japanese morphological analyzer using a forward-DP backward-A* N-best search algorithm
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Word identification for Mandarin Chinese sentences
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Two-level morphology with composition
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Recognizing unregistered names for Mandarin word identification
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
Weighted rational transductions and their application to human language processing
HLT '94 Proceedings of the workshop on Human Language Technology
Comparing representations in Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text retrieval without using a dictionary
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A new statistical formula for Chinese text segmentation incorporating contextual information
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Journal of the American Society for Information Science and Technology
Enhancing access to the levy sheet music collection: reconstructing full-text lyrics from syllables
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Improving English and Chinese Ad-Hoc Retrieval: A Tipster Text Phase 3 Project Report
Information Retrieval
Self-Supervised Chinese Word Segmentation
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Universal Segmentation of Text with the Sumo Formalism
NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Learning pattern rules for Chinese named entity extraction
Eighteenth national conference on Artificial intelligence
A compression-based algorithm for Chinese word segmentation
Computational Linguistics
Critical tokenization and its properties
Computational Linguistics
Mostly-unsupervised statistical segmentation of Japanese Kanji sequences
Natural Language Engineering
Natural Language Engineering
Mostly-unsupervised statistical segmentation of Japanese: applications to kanji
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Multilingual text analysis for text-to-speech synthesis
Natural Language Engineering
Chinese word segmentation and its effect on information retrieval
Information Processing and Management: an International Journal
A trainable rule-based algorithm for word segmentation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A formalism for universal segmentation of text
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Accessor variety criteria for Chinese word extraction
Computational Linguistics
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A maximum-entropy chinese parser augmented by transformation-based learning
ACM Transactions on Asian Language Information Processing (TALIP)
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus
Natural Language Engineering
Unknown word extraction for Chinese documents
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
An agent-based approach to Chinese named entity recognition
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Simple features for Chinese word sense disambiguation
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger
Journal of Functional Programming
Improved source-channel models for Chinese word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Chinese-Japanese cross language information retrieval: a Han character based approach
WWSM '00 Proceedings of the ACL-2000 workshop on Word senses and multi-linguality - Volume 8
Statistically-enhanced new word identification in a rule-based Chinese system
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Multidimensional transformation-based learning
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
A character-net based Chinese text segmentation method
SEMANET '02 Proceedings of the 2002 workshop on Building and using semantic networks - Volume 11
Learning case-based knowledge for disambiguating Chinese word segmentation: a preliminary study
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
A word segmentation method with dynamic adapting to text using inductive learning
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Chinese word segmentation as LMR tagging
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
A maximum entropy Chinese character-based parser
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Automatic thesaurus development: Term extraction from title metadata
Journal of the American Society for Information Science and Technology - Research Articles
Named entity transliteration with comparable corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Discriminative pruning of language models for Chinese word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Chinese and Japanese word segmentation using word-level and character-level information
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Text analysis and language identification for polyglot text-to-speech synthesis
Speech Communication
Unsupervised query segmentation using generative language models and wikipedia
Proceedings of the 17th international conference on World Wide Web
Applications of corpus-based semantic similarity and word segmentation to database schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Chinese Word Segmentation for Terrorism-Related Contents
PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
A Joint Segmenting and Labeling Approach for Chinese Lexical Analysis
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Bilingually Motivated Word Segmentation for Statistical Machine Translation
ACM Transactions on Asian Language Information Processing (TALIP)
Combining Language Modeling and Discriminative Classification for Word Segmentation
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Training Global Linear Models for Chinese Word Segmentation
Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Bilingually motivated domain-adapted word segmentation for statistical machine translation
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised named entity transliteration using temporal and phonetic correlation
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
The impact of morphological stemming on Arabic mention detection and coreference resolution
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Chinese-Japanese cross language information retrieval: a Han character based approach
WorkSense '00 Proceedings of the ACL-2000 Workshop on Word Senses and Multi-Linguality
Graphemic approximation of phonological context for English-Chinese transliteration
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
A Unified Character-Based Tagging Framework for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
Bayesian inference for finite-state transducers
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Large-scale language modeling with random forests for mandarin Chinese speech-to-text
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Domain-specific Chinese word segmentation using suffix tree and mutual information
Information Systems Frontiers
Syntactic processing using the generalized perceptron and beam search
Computational Linguistics
Chinese new word identification: a latent discriminative model with global features
Journal of Computer Science and Technology - Special issue on natural language processing
Parsing the internal structure of words: a new paradigm for Chinese word segmentation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
How many multiword expressions do people know?
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
A new unsupervised approach to word segmentation
Computational Linguistics
Chinese abbreviation identification using abbreviation-template features and context information
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Category-pattern-based korean word-spacing
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Unsupervised segmentation of chinese corpus using accessor variety
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
An example-based study on chinese word segmentation using critical fragments
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
A new method to compose long unknown Chinese keywords
Journal of Information Science
A classical Chinese corpus with nested part-of-speech tags
LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Incremental joint approach to word segmentation, POS tagging, and dependency parsing in Chinese
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Phrase-based approach for adaptive tokenization
SIGMORPHON '12 Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology
On the learnability of shuffle ideals
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
The application of kalman filter based human-computer learning model to chinese word segmentation
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
How many multiword expressions do people know?
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 1
On the learnability of shuffle ideals
The Journal of Machine Learning Research
Hi-index | 0.00 |
The initial stage of text analysis for any NLP task usually involves the tokenization of the input into words. For languages like English one can assume, to a first approximation, that word boundaries are given by whitespace or punctuation. In various Asian languages, including Chinese, on the other hand, whitespace is never used to delimit words, so one must resort to lexical information to "reconstruct" the word-boundary information. In this paper we present a stochastic finite-state model wherein the basic workhorse is the weighted finite-state transducer. The model segments Chinese text into dictionary entries and words derived by various productive lexical processes, and---since the primary intended application of this model is to text-to-speech synthesis---provides pronunciations for these words. We evaluate the system's performance by comparing its segmentation "judgments" with the judgements of a pool of human segmenters, and the system is shown to perform quite well.