Machine Learning
Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A Winnow-Based Approach to Context-Sensitive Spelling Correction
Machine Learning - Special issue on natural language learning
Automatic Rule Acquisition for Spelling Correction
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Tagging English text with a probabilistic model
Computational Linguistics
A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Contextual spelling correction using latent semantic analysis
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Classifier combination for improved lexical disambiguation
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improving data driven wordclass tagging by system combination
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Combining Trigram-based and feature-based methods for context-sensitive spelling correction
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
HLT '01 Proceedings of the first international conference on Human language technology research
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
The Journal of Machine Learning Research
Shallow parsing using noisy and non-stationary training material
The Journal of Machine Learning Research
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Using the web to obtain frequencies for unseen bigrams
Computational Linguistics - Special issue on web as corpus
Introduction to the special issue on evaluating word sense disambiguation systems
Natural Language Engineering
Parameter optimization for machine-learning of word sense disambiguation
Natural Language Engineering
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Word translation disambiguation using bilingual bootstrapping
Computational Linguistics
Weakly-supervised relation classification for information extraction
Proceedings of the thirteenth ACM international conference on Information and knowledge management
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An unsupervised approach to recognizing discourse relations
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Shallow parsing on the basis of words only: a case study
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An empirical study of active learning with support vector machines for Japanese word segmentation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Weakly supervised natural language learning without redundant views
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Offline strategies for online question answering: answering questions before they are asked
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Web-based models for natural language processing
ACM Transactions on Speech and Language Processing (TSLP)
A Network Analysis Model for Disambiguation of Names in Lists
Computational & Mathematical Organization Theory
Sample Selection for Statistical Parsing
Computational Linguistics
WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
An incremental decision list learner
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Ensemble methods for automatic thesaurus extraction
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Using the web to overcome data sparseness
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Statistical named entity recognizer adaptation
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
A very very large corpus doesn't always yield reliable estimates
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Letter level learning for language independent diacritics restoration
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
An evaluation exercise for word alignment
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Training a naive bayes classifier via the EM algorithm with a class distribution constraint
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Blueprint for a high performance NLP infrastructure
SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
Bootstrapping coreference classifiers with multiple machine learning algorithms
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Using the web as an implicit training set: application to structural ambiguity resolution
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
An exploration of the principles underlying redundancy-based factoid question answering
ACM Transactions on Information Systems (TOIS)
Analysis of selective strategies to build a dependency-analyzed corpus
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Active learning for logistic regression: an evaluation
Machine Learning
Exploring hedge identification in biomedical literature
Journal of Biomedical Informatics
Identifying semitic roots: Machine learning with linguistic constraints
Computational Linguistics
Multilingual pronunciation by analogy
Natural Language Engineering
Has Computational Linguistics Become More Applied?
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Large-scale deep unsupervised learning using graphics processors
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Improving classification accuracy using automatically extracted training data
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Graph-based analysis of semantic drift in Espresso-like bootstrapping algorithms
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Semi-automatic entity set refinement
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The effect of corpus size on case frame acquisition for discourse analysis
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Scaling high-order character language models to gigabytes
Software '05 Proceedings of the Workshop on Software
Exploring large-data issues in the curriculum: a case study with MapReduce
TeachCL '08 Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics
Testing the efficacy of part-of-speech information in word completion
TextEntry '03 Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods
CUCWeb: a Catalan corpus built from the web
WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
All-word prediction as the ultimate confusable disambiguation
CHSLP '06 Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing
Data selection in semi-supervised learning for name tagging
IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
Weakly supervised learning methods for improving the quality of gene name normalization data
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Label correspondence learning for part-of-speech annotation transformation
Proceedings of the 18th ACM conference on Information and knowledge management
Web-scale N-gram models for lexical disambiguation
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Mining of parsed data to derive deverbal argument structure
GEAF '09 Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks
Tag confidence measure for semi-automatically updating named entity recognition
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Language models for contextual error detection and correction
CLAGI '09 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
The noisy channel model for unsupervised word sense disambiguation
Computational Linguistics
Scalable learning for object detection with GPU hardware
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Exploring web scale language models for search query processing
Proceedings of the 19th international conference on World wide web
Some of our best friends are statisticians
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Constructing a large scale text corpus based on the grid and trustworthiness
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Processing natural language without natural language processing
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Web-scale computer vision using MapReduce for multimedia data mining
Proceedings of the Tenth International Workshop on Multimedia Data Mining
Qme!: a speech-based question-answering system on mobile devices
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Bucking the trend: large-scale cost-focused active learning for statistical machine translation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Creating robust supervised classifiers via web-scale N-gram data
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Speech-driven access to the deep web on mobile devices
ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
The design of a proofreading software service
CL&W '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids
Unsupervised Part-of-Speech Tagging in the Large
Research on Language and Computation
Annotating large email datasets for named entity recognition with Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Search right and thou shalt find...: using web queries for learner error detection
IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Generating confusion sets for context-sensitive error correction
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Using web-scale N-grams to improve base NP parsing performance
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Heterogeneous parsing via collaborative decoding
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
BI'10 Proceedings of the 2010 international conference on Brain informatics
Automatic treebank conversion via informed decoding
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
SDDB: a self-dependent and data-based method for constructing bilingual dictionary from the web
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Automatic Treebank Conversion via Informed Decoding - A Case Study on Chinese Treebanks
ACM Transactions on Asian Language Information Processing (TALIP)
How many multiword expressions do people know?
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Combining labeled and unlabeled data for learning cross-document structural relationships
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Using verbs to characterize noun-noun relations
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Instance selection for machine translation using feature decay algorithms
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Annotating text segments using a web-based categorization approach
ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
Large-scale machine learning at twitter
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
IR-based traceability recovery as a plugin: an industrial case study
FDIA'11 Proceedings of the Fourth BCS-IRSG conference on Future Directions in Information Access
An evaluation of classification models for question topic categorization
Journal of the American Society for Information Science and Technology
Citation-based bootstrapping for large-scale author disambiguation
Journal of the American Society for Information Science and Technology
Improving searcher models using mouse cursor activity
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Automatic parallel fragment extraction from noisy data
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
The UI system in the HOO 2012 shared task on error correction
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
A unified approach to transliteration-based text input with online spelling correction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Four methods for supervised word sense disambiguation
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Scaling big data mining infrastructure: the twitter experience
ACM SIGKDD Explorations Newsletter
Mining large streams of user data for personalized recommendations
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
The amount of readily available on-line text has reached hundreds of billions of words and continues to grow. Yet for most core natural language tasks, algorithms continue to be optimized, tested and compared after training on corpora consisting of only one million words or less. In this paper, we evaluate the performance of different learning methods on a prototypical natural language disambiguation task, confusion set disambiguation, when trained on orders of magnitude more labeled data than has previously been used. We are fortunate that for this particular application, correctly labeled training data is free. Since this will often not be the case, we examine methods for effectively exploiting very large corpora when labeled data comes at a cost.