A Joint Segmenting and Labeling Approach for Chinese Lexical Analysis

Authors:
Xinhao Wang;Jiazhong Nie;Dingsheng Luo;Xihong Wu
Affiliations:
Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing, China 100871;Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing, China 100871;Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing, China 100871;Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing, China 100871
Venue:
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Year:
2008

Citing 21
Cited 0

Semirings, automata, languages

Semirings, automata, languages
Rational series and their languages

Rational series and their languages
A stochastic finite-state word-segmentation algorithm for Chinese

Computational Linguistics
A design principles of a weighted finite-state transducer library

Theoretical Computer Science - Special issue on implementing automata
Finite-state transducers in language and speech processing

Computational Linguistics
A novel use of statistical parsing to extract information from text

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Partial parsing via finite-state cascades

Natural Language Engineering
A stochastic finite-state word-segmentation algorithm for Chinese

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Finite-state transducer cascades to extract named entities in texts

Theoretical Computer Science - Implementation and application automata
More accurate tests for the statistical significance of result differences

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data

ICML '04 Proceedings of the twenty-first international conference on Machine learning
HHMM-based Chinese lexical analyzer ICTCLAS

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
A maximum entropy Chinese character-based parser

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Discriminative Reranking for Natural Language Parsing

Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Composition of conditional random fields for transfer learning

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A hybrid approach to word segmentation and POS tagging

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
A dual-layer CRFs based joint decoding method for cascaded segmentation and labeling tasks

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Jointly labeling multiple sequences: a factorial HMM approach

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Joint parsing and semantic role labeling

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
The integration of syntactic parsing and semantic role labeling

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces an approach which jointly performs a cascade of segmentation and labeling subtasks for Chinese lexical analysis, including word segmentation, named entity recognition and part-of-speech tagging. Unlike the traditional pipeline manner, the cascaded subtasks are conducted in a single step simultaneously, therefore error propagation could be avoided and the information could be shared among multi-level subtasks. In this approach, Weighted Finite State Transducers (WFSTs) are adopted. Within the unified framework of WFSTs, the models for each subtask are represented and then combined into a single one. Thereby, through one-pass decoding the joint optimal outputs for multi-level processes will be reached. The experimental results show the effectiveness of the presented joint processing approach, which significantly outperforms the traditional method in pipeline style.