A unified statistical model for the identification of English baseNP

Authors:
Endong Xun;Changning Huang;Ming Zhou
Affiliations:
Microsoft Research China, China;Microsoft Research China, China;Microsoft Research China, China
Venue:
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Year:
2000

Citing 5
Cited 16

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A memory-based approach to learning shallow natural language patterns

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Error-driven pruning of Treebank grammars for base noun phrase identification

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Man vs. machine: a case study in base noun phrase learning

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Improving query translation for cross-language information retrieval using statistical models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Ranking definitions with supervised learning methods

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Base Noun Phrase translation using web data and the EM algorithm

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Email data cleaning

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Multidimensional transformation-based learning

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
A supervised learning approach to search of definitions

Journal of Computer Science and Technology - Special section on China AVS standard
Chinese named entity recognition based on multiple features

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Statistical query translation models for cross-language information retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
A bio-inspired approach for multi-word expression extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Recommending questions using the mdl-based tree cut model

Proceedings of the 17th international conference on World Wide Web
ArnetMiner: extraction and mining of academic social networks

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A ranking approach to keyphrase extraction

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Jointly labeling multiple sequences: a factorial HMM approach

ACLstudent '05 Proceedings of the ACL Student Research Workshop
A Combination Approach to Web User Profiling

ACM Transactions on Knowledge Discovery from Data (TKDD)
Keyword extraction using support vector machine

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Naxi sentence similarity calculation based on improved chunking edit-distance

International Journal of Wireless and Mobile Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel statistical model for automatic identification of English baseNP. It uses two steps: the N-best Part-Of-Speech (POS) tagging and baseNP identification given the N-best POS-sequences. Unlike the other approaches where the two steps are separated, we integrate them into a unified statistical framework. Our model also integrates lexical information. Finally, Viterbi algorithm is applied to make global search in the entire sentence, allowing us to obtain linear complexity for the entire process. Compared with other methods using the same testing set, our approach achieves 92.3% in precision and 93.2% in recall. The result is comparable with or better than the previously reported results.