Constructing and utilizing wordnets using statistical methods

Authors:
Gerard Melo;Gerhard Weikum
Affiliations:
Max Planck Institute for Informatics, Saarbrücken, Germany 66123;Max Planck Institute for Informatics, Saarbrücken, Germany 66123
Venue:
Language Resources and Evaluation
Year:
2012

Citing 22
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
EuroWordNet: a multilingual database with lexical semantic networks

EuroWordNet: a multilingual database with lexical semantic networks
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Construction of a Chinese-English WordNet and its application to CLIR

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Class-Based Construction of a Verb Lexicon

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Text and knowledge mining for coreference resolution

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Building a large ontology for machine translation

HLT '93 Proceedings of the workshop on Human Language Technology
Building Japanese-English dictionary based on ontology for machine translation

HLT '94 Proceedings of the workshop on Human Language Technology
Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned wordnets

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
A note on Platt's probabilistic outputs for support vector machines

Machine Learning
Automatically creating datasets for measures of semantic relatedness

LD '06 Proceedings of the Workshop on Linguistic Distances
Towards a universal wordnet by learning from combined evidence

Proceedings of the 18th ACM conference on Information and knowledge management
Revising the wordnet domains hierarchy: semantics, coverage and balancing

MLR '04 Proceedings of the Workshop on Multilingual Linguistic Ressources
Multilingual text classification using ontologies

ECIR'07 Proceedings of the 29th European conference on IR research
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Using the structure of a conceptual network in computing semantic relatedness

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lexical databases following the wordnet paradigm capture information about words, word senses, and their relationships. A large number of existing tools and datasets are based on the original WordNet, so extending the landscape of resources aligned with WordNet leads to great potential for interoperability and to substantial synergies. Wordnets are being compiled for a considerable number of languages, however most have yet to reach a comparable level of coverage. We propose a method for automatically producing such resources for new languages based on WordNet, and analyse the implications of this approach both from a linguistic perspective as well as by considering natural language processing tasks. Our approach takes advantage of the original WordNet in conjunction with translation dictionaries. A small set of training associations is used to learn a statistical model for predicting associations between terms and senses. The associations are represented using a variety of scores that take into account structural properties as well as semantic relatedness and corpus frequency information. Although the resulting wordnets are imperfect in terms of their quality and coverage of language-specific phenomena, we show that they constitute a cheap and suitable alternative for many applications, both for monolingual tasks as well as for cross-lingual interoperability. Apart from analysing the resources directly, we conducted tests on semantic relatedness assessment and cross-lingual text classification with very promising results.