Thesaurus extension using web search engines

Authors:
Robert Meusel;Mathias Niepert;Kai Eckert;Heiner Stuckenschmidt
Affiliations:
KR & KM Research Group, University of Mannheim, Germany;KR & KM Research Group, University of Mannheim, Germany;KR & KM Research Group, University of Mannheim, Germany;KR & KM Research Group, University of Mannheim, Germany
Venue:
ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
Year:
2010

Citing 13
Cited 1

Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Automatic thesaurus generation through multiple filtering

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Ensemble methods for automatic thesaurus extraction

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Optimizing synonym extraction using monolingual and bilingual resources

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
A dynamic ontology for a dynamic reference work

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Finding synonyms using automatic word alignment and measures of distributional similarity

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Using hidden Markov random fields to combine distributional and pattern-based word clustering

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Graph-based word clustering using a web search engine

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Crowdsourcing the assembly of concept hierarchies

Proceedings of the 10th annual joint conference on Digital libraries

Identifying references to datasets in publications

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Maintaining and extending large thesauri is an important challenge facing digital libraries and IT businesses alike. In this paper we describe a method building on and extending existing methods from the areas of thesaurus maintenance, natural language processing, and machine learning to (a) extract a set of novel candidate concepts from text corpora and (b) to generate a small ranked list of suggestions for the position of these concept in an existing thesaurus. Based on a modification of the standard tf-idf term weighting we extract relevant concept candidates from a document corpus. We then apply a pattern-based machine learning approach on content extracted from web search engine snippets to determine the type of relation between the candidate terms and existing thesaurus concepts. The approach is evaluated with a largescale experiment using the MeSH and WordNet thesauri as testbed.