Selection and information: a class-based approach to lexical relationships
Selection and information: a class-based approach to lexical relationships
Automated postediting of documents
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Word sense disambiguation using a second language monolingual corpus
Computational Linguistics
A Winnow-Based Approach to Context-Sensitive Spelling Correction
Machine Learning - Special issue on natural language learning
On the MSE robustness of batching estimators
Proceedings of the 33nd conference on Winter simulation
Automatic Rule Acquisition for Spelling Correction
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Computational Linguistics - Special issue on web as corpus
Computational Linguistics - Special issue on web as corpus
Using the web to obtain frequencies for unseen bigrams
Computational Linguistics - Special issue on web as corpus
Automatic association of web directories with word senses
Computational Linguistics - Special issue on web as corpus
Structural ambiguity and lexical relations
Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Lexical semantic techniques for corpus analysis
Computational Linguistics - Special issue on using large corpora: II
Contextual spelling correction using latent semantic analysis
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Statistical models for unsupervised prepositional phrase attachment
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Generation that exploits corpus-based statistical knowledge
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Corpus statistics meet the noun compound: some empirical results
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Two-level, many-paths generation
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Combining Trigram-based and feature-based methods for context-sensitive spelling correction
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A rule-based approach to prepositional phrase attachment disambiguation
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Comlex Syntax: building a computational lexicon
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Using a probabilistic class-based lexicon for lexical ambiguity resolution
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A method for word sense disambiguation of unrestricted text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
HLT '01 Proceedings of the first international conference on Human language technology research
Base Noun Phrase translation using web data and the EM algorithm
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Learning the countability of English nouns from corpus data
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The order of prenominal adjectives in natural language generation
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
An unsupervised approach to prepositional phrase attachment using contextually similar words
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A maximum entropy model for prepositional phrase attachment
HLT '94 Proceedings of the workshop on Human Language Technology
Memory-based learning for article generation
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Augmented mixture models for lexical disambiguation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
MEANING: a roadmap to knowledge technologies
COLING-Roadmap '02 Proceedings of the 2002 COLING workshop: A roadmap for computational linguistics - Volume 13
Using the web in machine learning for other-anaphora resolution
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
HLT-SRWS '04 Proceedings of the Student Research Workshop at HLT-NAACL 2004
A feedback-augmented method for detecting errors in the writing of learners of English
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Using the web as an implicit training set: application to structural ambiguity resolution
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
Reinforcing English countability prediction with one countability per discourse property
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
ACM Transactions on Speech and Language Processing (TSLP)
Web resources for language modeling in conversational speech recognition
ACM Transactions on Speech and Language Processing (TSLP)
A Noun-Predicate Bigram-Based Similarity Measure for Lexical Relations
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Zero-Anaphora Resolution in Chinese Using Maximum Entropy
IEICE - Transactions on Information and Systems
A Method for Reinforcing Noun Countability Prediction
IEICE - Transactions on Information and Systems
Service Selection in Business Service Ecosystem
Service-Oriented Computing --- ICSOC 2008 Workshops
Improving classification accuracy using automatically extracted training data
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic classification of noun phrases using web counts and learning algorithms
ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Unsupervised recognition of literal and non-literal use of idiomatic expressions
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Detecting parser errors using web-based semantic filters
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Text data acquisition for domain-specific language models
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Interpretation of compound nominalisations using corpus and web statistics
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Selecting relevant text subsets from web-data for building topic specific language models
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
The effect of corpus size on case frame acquisition for discourse analysis
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
UCB: system description for SemEval task #4
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Using lexical patterns in the Google Web 1T corpus to deduce semantic relations between nouns
DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Efficient handling of N-gram language models for statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Answer typing for information retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Web-scale N-gram models for lexical disambiguation
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Automatic identification of semantic relations in Italian complex nominals
IWCS-8 '09 Proceedings of the Eighth International Conference on Computational Semantics
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
A knowledge-rich approach to identifying semantic relations between nominals
Information Processing and Management: an International Journal
An English and/or Japanese writing support tool based on a web search engine
International Journal of Computer Applications in Technology
Creating robust supervised classifiers via web-scale N-gram data
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
UvT: Memory-based pairwise ranking of paraphrasing verbs
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Search right and thou shalt find...: using web queries for learner error detection
IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Improved natural language learning via variance-regularization support vector machines
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Using web-scale N-grams to improve base NP parsing performance
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Exploring the data-driven prediction of prepositions in English
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Web-based and combined language models: a case study on noun compound identification
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
SDDB: a self-dependent and data-based method for constructing bilingual dictionary from the web
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Resources for Turkish morphological processing
Language Resources and Evaluation
Grammatical error correction with alternating structure optimization
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Piggyback: using search engines for robust cross-domain named entity recognition
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Using large monolingual and bilingual corpora to improve coordination disambiguation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Exploiting web-derived selectional preference to improve statistical dependency parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
ONTECTAS: bridging the gap between collaborative tagging systems and structured data
CAiSE'11 Proceedings of the 23rd international conference on Advanced information systems engineering
Exploiting learners' tendencies for detecting english determiner errors
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II
Web-based validation for contextual targeted paraphrasing
MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Using verbs to characterize noun-noun relations
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Reranking bilingually extracted paraphrases using monolingual distributional similarity
GEMS '11 Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
A framework for semantic discovery of web services
iUBICOM'10 Proceedings of the 5th international conference on Ubiquitous and Collaborative Computing
Automated functional testing of online search services
Software Testing, Verification & Reliability
Unsupervised learning on an approximate corpus
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Monolingual distributional similarity for text-to-text generation
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Context similarity measure using Fuzzy Formal Concept Analysis
Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Zero anaphora resolution in chinese and its application in chinese-english machine translation
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Collaboratively built semi-structured content and Artificial Intelligence: The story so far
Artificial Intelligence
Extraction of multi-word expressions from small parallel corpora
Natural Language Engineering
Semantic interpretation of noun compounds using verbal and other paraphrases
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
Using part---whole relations for automatic deduction of compound-internal relations in GermaNet
Language Resources and Evaluation
Hi-index | 0.00 |
Previous work demonstrated that Web counts can be used to approximate bigram counts, suggesting that Web-based frequencies should be useful for a wide variety of Natural Language Processing (NLP) tasks. However, only a limited number of tasks have so far been tested using Web-scale data sets. The present article overcomes this limitation by systematically investigating the performance of Web-based models for several NLP tasks, covering both syntax and semantics, both generation and analysis, and a wider range of n-grams and parts of speech than have been previously explored. For the majority of our tasks, we find that simple, unsupervised models perform better when n-gram counts are obtained from the Web rather than from a large corpus. In some cases, performance can be improved further by using backoff or interpolation techniques that combine Web counts and corpus counts. However, unsupervised Web-based models generally fail to outperform supervised state-of-the-art models trained on smaller corpora. We argue that Web-based models should therefore be used as a baseline for, rather than an alternative to, standard supervised models.