Forgetting Exceptions is Harmful in Language Learning
Machine Learning - Special issue on natural language learning
Explorations in Automatic Thesaurus Discovery
Explorations in Automatic Thesaurus Discovery
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Empirical estimates of adaptation: the chance of two noriegas is closer to p/2 than p2
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity
Computational Linguistics
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Scaling distributional similarity to large corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Comparing corpora with WordSmith tools: how large must the reference corpus be?
CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Large linguistically-processed web corpora for multiple languages
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Comparing Different Properties Involved in Word Similarity Extraction
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Journal of Artificial Intelligence Research
A supervised method of feature weighting for measuring semantic relatedness
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Exemplar-based word-space model for compositionality detection: shared task system description
DiSCo '11 Proceedings of the Workshop on Distributional Semantics and Compositionality
Statistical thesaurus construction for a morphologically rich language
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Automatic thesaurus construction for cross generation corpus
Journal on Computing and Cultural Heritage (JOCCH)
Hi-index | 0.00 |
Gorman and Curran (2006) argue that thesaurus generation for billion+-word corpora is problematic as the full computation takes many days. We present an algorithm with which the computation takes under two hours. We have created, and made publicly available, thesauruses based on large corpora for (at time of writing) seven major world languages. The development is implemented in the Sketch Engine (Kilgarriff et al., 2004). Another innovative development in the same tool is the presentation of the grammatical behaviour of a word against the background of how all other words of the same word class behave. Thus, the English noun constraint occurs 75% in the plural. Is this a salient lexical fact? To form a judgement, we need to know the distribution for all nouns. We use histograms to present the distribution in a way that is easy to grasp.