Word association norms, mutual information, and lexicography
Computational Linguistics
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Class-based n-gram models of natural language
Computational Linguistics
Similarity-Based Models of Word Cooccurrence Probabilities
Machine Learning - Special issue on natural language learning
Improving the effectiveness of information retrieval with local context analysis
ACM Transactions on Information Systems (TOIS)
The impact of corpus size on question answering performance
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering word senses from text
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Word clustering and disambiguation based on co-occurrence data
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Scoring missing terms in information retrieval tasks
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A comparison of LSA, wordNet and PMI-IR for predicting user click behavior
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
MM&Sec '06 Proceedings of the 8th workshop on Multimedia and security
Similarity of Semantic Relations
Computational Linguistics
A statistical model for near-synonym choice
ACM Transactions on Speech and Language Processing (TSLP)
Expressing implicit semantic relations without supervision
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Fast computation of lexical affinity models
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Document representation and multilevel measures of document similarity
NAACL-DocConsortium '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: doctoral consortium
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
Recognition and classification of noun phrases in queries for effective retrieval
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
POLYPHONET: An advanced social network extraction system from the Web
Web Semantics: Science, Services and Agents on the World Wide Web
Acquiring Word Similarities with Higher Order Association Mining
ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Lexical and Semantic Resources for NLP: From Words to Meanings
KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
Named entity recognition in biomedical texts using an HMM model
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Using hidden Markov random fields to combine distributional and pattern-based word clustering
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Computing term translation probabilities with generalized latent semantic analysis
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Graph-based word clustering using a web search engine
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Learning graph walk based similarity measures for parsed text
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploiting internal and external semantics for the clustering of short texts using world knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
Relieving Polysemy Problem for Synonymy Detection
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
New experiments in distributional representations of synonymy
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Taxonomy construction using compound similarity measure
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
A comparison of co-occurrence and similarity measures as simulations of context
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Text relatedness based on a word thesaurus
Journal of Artificial Intelligence Research
Graph-based clustering for computational linguistics: a survey
TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing
Paraphrase alignment for synonym evidence discovery
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Knowledge-based sense disambiguation (almost) for all structures
Information Systems
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Distributional memory: A general framework for corpus-based semantics
Computational Linguistics
Measuring Chinese-English cross-lingual word similarity with HowNet and parallel corpus
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Is singular value decomposition useful for word similarity extraction?
Language Resources and Evaluation
A nearest-neighbor method for resolving PP-Attachment ambiguity
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Similarity of objects and the meaning of words
TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation
Evaluation of analogical proportions through Kolmogorov complexity
Knowledge-Based Systems
Using COTS search engines and custom query strategies at CLEF
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Micropinion generation: an unsupervised approach to generating ultra-concise summaries of opinions
Proceedings of the 21st international conference on World Wide Web
The CQC algorithm: cycling in graphs to semantically enrich and enhance a bilingual dictionary
Journal of Artificial Intelligence Research
Computational approaches to sentence completion
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A challenge set for advancing language modeling
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Supervised learning of semantic relatedness
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
A versatile tool for privacy-enhanced web search
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Graded relevance ranking for synonym discovery
Proceedings of the 22nd international conference on World Wide Web companion
Can back-of-the-book indexes be automatically created?
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Statistical measures of word similarity have application in many areas of natural language processing, such as language modeling and information retrieval. We report a comparative study of two methods for estimating word co-occurrence frequencies required by word similarity measures. Our frequency estimates are generated from a terabyte-sized corpus of Web data, and we study the impact of corpus size on the effectiveness of the measures. We base the evaluation on one TOEFL question set and two practice questions sets, each consisting of a number of multiple choice questions seeking the best synonym for a given target word. For two question sets, a context for the target word is provided, and we examine a number of word similarity measures that exploit this context. Our best combination of similarity measure and frequency estimation method answers 6-8% more questions than the best results previously reported for the same question sets.