Explorations in Automatic Thesaurus Discovery
Explorations in Automatic Thesaurus Discovery
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Automatic thesaurus generation through multiple filtering
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Ensemble methods for automatic thesaurus extraction
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Optimizing synonym extraction using monolingual and bilingual resources
PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
A dynamic ontology for a dynamic reference work
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Finding synonyms using automatic word alignment and measures of distributional similarity
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Using hidden Markov random fields to combine distributional and pattern-based word clustering
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Graph-based word clustering using a web search engine
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Crowdsourcing the assembly of concept hierarchies
Proceedings of the 10th annual joint conference on Digital libraries
Identifying references to datasets in publications
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Hi-index | 0.00 |
Maintaining and extending large thesauri is an important challenge facing digital libraries and IT businesses alike. In this paper we describe a method building on and extending existing methods from the areas of thesaurus maintenance, natural language processing, and machine learning to (a) extract a set of novel candidate concepts from text corpora and (b) to generate a small ranked list of suggestions for the position of these concept in an existing thesaurus. Based on a modification of the standard tf-idf term weighting we extract relevant concept candidates from a document corpus. We then apply a pattern-based machine learning approach on content extracted from web search engine snippets to determine the type of relation between the candidate terms and existing thesaurus concepts. The approach is evaluated with a largescale experiment using the MeSH and WordNet thesauri as testbed.