Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Latent semantic space: iterative scaling improves precision of inter-document similarity measurement
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Discovering word senses from text
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Measuring praise and criticism: Inference of semantic orientation from association
ACM Transactions on Information Systems (TOIS)
The Journal of Machine Learning Research
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures
ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Measures of distributional similarity
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
HLT '01 Proceedings of the first international conference on Human language technology research
Web-based models for natural language processing
ACM Transactions on Speech and Language Processing (TSLP)
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Collective information extraction with relational Markov networks
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Scaling distributional similarity to large corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Towards terascale knowledge acquisition
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Proceedings of the 16th international conference on World Wide Web
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
The effect of corpus size in combining supervised and unsupervised training for disambiguation
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Weakly-supervised discovery of named entities using web search queries
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
"More like these": growing entity classes from seeds
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Context-aware query suggestion by mining click-through and session data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Language-Independent Set Expansion of Named Entities Using the Web
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Iterative Set Expansion of Named Entities Using the Web
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Towards intent-driven bidterm suggestion
Proceedings of the 18th international conference on World wide web
Pairwise document similarity in large collections with MapReduce
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
A structured vector space model for word meaning in context
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A study on similarity and relatedness using distributional and WordNet-based approaches
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Locating complex named entities in web text
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
Helping editors choose better seed sets for entity set expansion
Proceedings of the 18th ACM conference on Information and knowledge management
A web service for automatic word class acquisition
Proceedings of the 3rd International Universal Communication Symposium
Entity extraction via ensemble semantics
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Minimally-supervised extraction of entities from text advertisements
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improved extraction assessment through better language models
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Not all seeds are equal: measuring the quality of text mining seeds
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The viability of web-derived polarity lexicons
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A Bayesian method for robust estimation of distributional similarities
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Experiments in graph-based semi-supervised learning methods for class-instance acquisition
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Learning arguments and supertypes of semantic relations using recursive patterns
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Distributional similarity vs. PU learning for entity set expansion
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
An active learning approach to finding related terms
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Nearest neighbor search: algorithmic perspective
SIGSPATIAL Special
Sketching techniques for large scale NLP
WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
Sketch techniques for scaling distributional similarity to the web
GEMS '10 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics
Unsupervised discovery of negative categories in lexicon bootstrapping
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
FactRank: random walks on a web of facts
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Open entity extraction from web search query logs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Corpus-based semantic class mining: distributional vs. pattern-based approaches
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Grouping product features using semi-supervised learning with soft-constraints
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Clustering product features for opinion mining
Proceedings of the fourth ACM international conference on Web search and data mining
Best topic word selection for topic labelling
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Extracting and ranking product features in opinion documents
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Domain-independent entity extraction from web search query logs
Proceedings of the 20th international conference companion on World wide web
Entity set expansion in opinion documents
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Jigs and lures: associating web queries with structured entities
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Nonlinear evidence fusion and propagation for hyponymy relation mining
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Efficient online locality sensitive hashing via reservoir counting
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Entity set expansion using topic information
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Mavuno: a scalable and effective Hadoop-based paraphrase acquisition system
Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
Synthesizing high utility suggestions for rare web search queries
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Language models as representations for weakly-supervised NLP tasks
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Automatically building training examples for entity extraction
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
User Behaviors in Related Word Retrieval and New Word Detection: A Collaborative Perspective
ACM Transactions on Asian Language Information Processing (TALIP)
Proceedings of the 20th ACM international conference on Information and knowledge management
Generating semantic orientation lexicon using large data and thesaurus
WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
MSR-NLP entry in BioNLP Shared Task 2011
BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Approximate scalable bounded space sketch for large data NLP
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
A domain-independent approach to finding related entities
Information Processing and Management: an International Journal
Mining market trend from blog titles based on lexical semantic similarity
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Efficient searching top-k semantic similar words
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Answering table queries on the web using column keywords
Proceedings of the VLDB Endowment
A framework for robust discovery of entity synonyms
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
ArabOnto: experimenting a new distributional approach for building Arabic ontological resources
International Journal of Metadata, Semantics and Ontologies
A semi-supervised approach to extracting multiword entity names from user reviews
Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search
Structuring e-commerce inventory
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
No noun phrase left behind: detecting and typing unlinkable entities
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Ensemble semantics for large-scale unsupervised relation extraction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Fast large-scale approximate graph construction for NLP
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Bootstrapping biomedical ontologies for scientific text using NELL
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Recognizing arguing subjectivity and argument tags
ExProM '12 Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics
Learning to find comparable entities on the web
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning
Proceedings of the sixth ACM international conference on Web search and data mining
Automatic thesaurus construction for cross generation corpus
Journal on Computing and Cultural Heritage (JOCCH)
Extracting query facets from search results
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Mining acronym expansions and their meanings using query click log
Proceedings of the 22nd international conference on World Wide Web
Tailoring the automated construction of large-scale taxonomies using the web
Language Resources and Evaluation
Dimension independent similarity computation
The Journal of Machine Learning Research
Acquisition of open-domain classes via intersective semantics
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly scalable implementation based on distributional similarity, implemented in the MapReduce framework and deployed over a 200 billion word crawl of the Web. The pairwise similarity between 500 million terms is computed in 50 hours using 200 quad-core nodes. We apply the learned similarity matrix to the task of automatic set expansion and present a large empirical study to quantify the effect on expansion performance of corpus size, corpus quality, seed composition and seed size. We make public an experimental testbed for set expansion analysis that includes a large collection of diverse entity sets extracted from Wikipedia.