Detection of entity mentions occurring in English and Chinese text

Authors:
Kadri Hacioglu;Benjamin Douglas;Ying Chen
Affiliations:
University of Colorado at Boulder;University of Colorado at Boulder;University of Colorado at Boulder
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 8
Cited 2

The nature of statistical learning theory

The nature of statistical learning theory
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
The entity-relationship model—toward a unified view of data

ACM Transactions on Database Systems (TODS) - Special issue: papers from the international conference on very large data bases: September 22–24, 1975, Framingham, MA
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Reducing multiclass to binary: a unifying approach for margin classifiers

The Journal of Machine Learning Research
Representing text chunks

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Use of support vector learning for chunk identification

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions

Factorizing complex models: a case study in mention detection

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
CU-COMSEM: exploring rich features for unsupervised web personal name disambiguation

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe an integrated approach to entity mention detection that yields a monolithic, almost language independent system. It is optimal in the sense that all categorical constraints are simultaneously considered. The system is compact and easy to develop and maintain, since only a single set of features and classifiers are needed to be designed and optimized. It is implemented using one-versus-all support vector machine (SVM) classifiers and a number of feature extractors at several linguistic levels. SVMs are well known for their ability to handle a large set of overlapping features with theoretically sound generalization properties. Data sparsity might be an important issue as a result of a large number of classes and relatively moderate training data size. However, we report results that the integrated system performs as good as a pipelined system that decomposes the problem into a few smaller sub-tasks. We conduct all our experiments using ACE 2004 data, evaluate the systems using ACE metrics and report competitive performance.