Glen, Glenda or Glendale: unsupervised and semi-supervised learning of English noun gender

Authors:
Shane Bergsma;Dekang Lin;Randy Goebel
Affiliations:
University of Alberta, Alberta, Canada;Google, Inc., Mountain View, California;University of Alberta, Alberta, Canada
Venue:
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Year:
2009

Citing 20
Cited 1

An algorithm for pronominal anaphora resolution

Computational Linguistics
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
A machine learning approach to coreference resolution of noun phrases

Computational Linguistics - Special issue on computational anaphora resolution
Probabilistic and rule-based tagger of an inflective language: a comparison

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Anaphora for everyone: pronominal anaphora resoluation without a parser

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Text and knowledge mining for coreference resolution

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Minimally supervised induction of grammatical gender

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Offline strategies for online question answering: answering questions before they are asked

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
Collective information extraction with relational Markov networks

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Bootstrapping path-based pronoun resolution

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
EM works for pronoun anaphora resolution

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research
An expectation maximization approach to pronoun resolution

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Automatic acquisition of gender information for anaphora resolution

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence

A search engine approach to estimating temporal changes in gender orientation of first names

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

English pronouns like he and they reliably reflect the gender and number of the entities to which they refer. Pronoun resolution systems can use this fact to filter noun candidates that do not agree with the pronoun gender. Indeed, broad-coverage models of noun gender have proved to be the most important source of world knowledge in automatic pronoun resolution systems. Previous approaches predict gender by counting the co-occurrence of nouns with pronouns of each gender class. While this provides useful statistics for frequent nouns, many infrequent nouns cannot be classified using this method. Rather than using co-occurrence information directly, we use it to automatically annotate training examples for a large-scale discriminative gender model. Our model collectively classifies all occurrences of a noun in a document using a wide variety of contextual, morphological, and categorical gender features. By leveraging large volumes of un-labeled data, our full semi-supervised system reduces error by 50% over the existing state-of-the-art in gender classification.