Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

Authors:
Kristina Toutanova;Christopher D. Manning
Affiliations:
Stanford, CA;Stanford, CA
Venue:
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Year:
2000

Citing 8
Cited 123

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
A maximum entropy approach to natural language processing

Computational Linguistics
Statistical methods for speech recognition

Statistical methods for speech recognition
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Comparing a linguistic and a stochastic tagger

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

Investigating GIS and smoothing for maximum entropy taggers

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Conditional structure versus conditional estimation in NLP models

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Extracting the unextractable: a case study on verb-particles

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Investigating loss functions and optimization methods for discriminative learning of label sequences

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Evaluating parts-of-speech taggers for use in a text-to-scene conversion system

SAICSIT '05 Proceedings of the 2005 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Using lexical chains for keyword extraction

Information Processing and Management: an International Journal
Hard vs. Fuzzy Clustering for Speech Utterance Categorization

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
Semantic-Based Temporal Text-Rule Mining

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Supporting the discovery and labeling of non-taxonomic relationships in ontology learning

Expert Systems with Applications: An International Journal
Exploring Java software vocabulary: A search and mining perspective

SUITE '09 Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation
Assessing the costs of sampling methods in active learning for annotation

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A classification-based approach to question answering in discussion boards

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Investigation in statistical language-independent approaches for opinion detection in English, Chinese and Japanese

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Prepositions in applications: A survey and introduction to the special issue

Computational Linguistics
CollabRank: towards a collaborative approach to single-document keyphrase extraction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Contextual phrase-level polarity analysis using lexical affect scoring and syntactic N-grams

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
The linguistic structure of English web-search queries

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Adapting ADtrees for high arity features

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Single document keyphrase extraction using neighborhood knowledge

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Interactive annotation learning with indirect feature voting

SRWS '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
Mining Concepts from Wikipedia for Ontology Construction

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Active learning for part-of-speech tagging: accelerating corpus annotation

LAW '07 Proceedings of the Linguistic Annotation Workshop
Agreement detection in multiparty conversation

Proceedings of the 2009 international conference on Multimodal interfaces
Deep lexical acquisition of verb-particle constructions

Computer Speech and Language
Automatically detecting action items in audio meeting recordings

SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
Semantic Enrichment of Database Textual Attributes

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
High-performance high-volume layered corpora annotation

ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Reading to learn: constructing features from semantic abstracts

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Generating baseball summaries from multiple perspectives by reordering content

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

ACM Transactions on Information Systems (TOIS)
Linguistic analysis of bug report titles with respect to the dimension of bug importance

Proceedings of the Third Annual ACM Bangalore Conference
A Linguistically Inspired Statistical Model for Chinese Punctuation Generation

ACM Transactions on Asian Language Information Processing (TALIP)
A comparison of sentiment analysis techniques: polarizing movie blogs

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Mixture model based contextual image retrieval

Proceedings of the ACM International Conference on Image and Video Retrieval
Movie reviews and revenues: an experiment in text regression

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic generation of personalized annotation tags for Twitter users

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Profiting from mark-up: hyper-text annotations for guided parsing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Evolutionary algorithms for definition extraction

WDE '09 Proceedings of the 1st Workshop on Definition Extraction
The design of a proofreading software service

CL&W '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids
Context-based word acquisition for situated dialogue in a virtual world

Journal of Artificial Intelligence Research
On dual decomposition and linear programming relaxations for natural language processing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A probabilistic morphological analyzer for Syriac

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Lessons learned in part-of-speech tagging of conversational speech

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Hierarchical reinforcement learning for adaptive text generation

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
"Got you!": automatic vandalism detection in Wikipedia with web-based shallow syntactic-semantic modeling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A flexible approach to class-based ordering of prenominal modifiers

Empirical methods in natural language generation
A cocktail approach to the VideoCLEF'09 linking task

CLEF'09 Proceedings of the 10th international conference on Cross-language evaluation forum: multimedia experiments
Enhancing negation-aware sentiment classification on product reviews via multi-unigram feature generation

ICIC'10 Proceedings of the 6th international conference on Advanced intelligent computing theories and applications: intelligent computing
Schema label normalization for improving schema matching

Data & Knowledge Engineering
Scientometrics of big science: a case study of research in the Sloan Digital Sky Survey

Scientometrics
Aspect-based sentiment analysis of movie reviews on discussion boards

Journal of Information Science
Mobile merchandise evaluation service using novel information retrieval and image recognition technology

Computer Communications
Why text segment classification based on part of speech feature selection

DS'10 Proceedings of the 13th international conference on Discovery science
Cross-domain speech disfluency detection

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A multi-domain web-based algorithm for POS tagging of unknown words

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
LexInfo: A declarative model for the lexicon-ontology interface

Web Semantics: Science, Services and Agents on the World Wide Web
A clustering study of a 7000 EU document inventory using MDS and SOM

Expert Systems with Applications: An International Journal
A quantitative evaluation of global word sense induction

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Towards well-grounded phrase-level polarity analysis

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
GRAONTO: A graph-based approach for automatic construction of domain ontology

Expert Systems with Applications: An International Journal
A novel approach to keyphrase extraction using augmented transition networks and statistical tools

COMPUTE '11 Proceedings of the Fourth Annual ACM Bangalore Conference
C-Feel-It: a sentiment analyzer for micro-blogs

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Latent semantic word sense induction and disambiguation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
That's what she said: double entendre identification

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Building timelines from narrative clinical records: initial results based-on deep natural language understanding

BioNLP '11 Proceedings of BioNLP 2011 Workshop
User behavior in zero-recall ecommerce queries

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Detecting noun compounds and light verb constructions: a contrastive study

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Identifying event: sentiment association using lexical equivalence and co-reference approaches

RELMS '11 Proceedings of the ACL 2011 Workshop on Relational Models of Semantics
Improved Chinese--English SMT with Chinese “DE” Construction Classification and Reordering

ACM Transactions on Asian Language Information Processing (TALIP)
On specifying requirements using a semantically controlled representation

NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Adapting a WSJ trained part-of-speech tagger to noisy text: preliminary results

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Identifying verbal collocations in wikipedia articles

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Harvesting Wikipedia Knowledge to Identify Topics in Ongoing Natural Language Dialogs

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Unsupervised multilingual learning

Unsupervised multilingual learning
Tool support for technology scouting using online sources

ER'11 Proceedings of the 30th international conference on Advances in conceptual modeling: recent developments and new directions
Corporate news classification and valence prediction: a supervised approach

WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Detecting levels of interest from spoken dialog with multistream prediction feedback and similarity based hierarchical fusion learning

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
To stay or leave?: the relationship of emotional and informational support to commitment in online health support groups

Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Language models for machine translation: original vs. translated texts

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Latent vector weighting for word meaning in context

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning general connotation of words using graph-based algorithms

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mixture of logistic models and an ensemble approach for protein-protein interaction extraction

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Functional grouping of natural language requirements for assistance in architectural software design

Knowledge-Based Systems
Analyzing document collections via context-aware term extraction

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Emotion holder for emotional verbs – the role of subject and syntax

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
GEMS: generative modeling for evaluation of summaries

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Quantitative evaluation of grammaticality of summaries

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Application of a clustering method on sentiment analysis

Journal of Information Science
Rewriting null e-commerce queries to recommend products

Proceedings of the 21st international conference companion on World Wide Web
A machine-learning approach to negation and speculation detection in clinical texts

Journal of the American Society for Information Science and Technology
A framework for automatic TRIZ level of invention estimation of patents using natural language processing, knowledge-transfer and patent citation metrics

Computer-Aided Design
A term normalization method for better performance of terminology construction

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
Toward developing a very big sign language parallel corpus

ICCHP'12 Proceedings of the 13th international conference on Computers Helping People with Special Needs - Volume Part II
NLARE, a natural language processing tool for automatic requirements evaluation

Proceedings of the CUBE International Information Technology Conference
A general framework for time-aware decision support systems

Expert Systems with Applications: An International Journal
Sentence dependency tagging in online question answering forums

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A cost sensitive part-of-speech tagging: differentiating serious errors from minor errors

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A meta learning approach to grammatical error correction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Improving NLP through marginalization of hidden syntactic structure

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Domain taxonomy learning from text: The subsumption method versus hierarchical clustering

Data & Knowledge Engineering
Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging

Language Resources and Evaluation
Language models for machine translation: Original vs. translated texts

Computational Linguistics
Semantic question answering system over linked data using relational patterns

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Full Length Article: Simulated annealing based classifier ensemble techniques: Application to part of speech tagging

Information Fusion
A term normalization method for efficient knowledge acquisition through text processing

Multimedia Tools and Applications
Understanding the specificity of web search queries

CHI '13 Extended Abstracts on Human Factors in Computing Systems
Learning to crawl deep web

Information Systems
Juggling the Jigsaw: towards automated problem inference from network trouble tickets

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Learning to detect english and hungarian light verb constructions

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 1
Semantic interpretation of noun compounds using verbal and other paraphrases

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
Mobile haptic e-book system to support 3D immersive reading in ubiquitous environments

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
A comparison study of clustering models for online review sentiment analysis

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Detecting topic labels for tweets by matching features from pseudo-relevance feedback

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Summaries on the fly: query-based extraction of structured knowledge from web documents

ICWE'13 Proceedings of the 13th international conference on Web Engineering
Bridging abstraction layers in process mining by automated matching of events and activities

BPM'13 Proceedings of the 11th international conference on Business Process Management
An investigation of code-switching attitude dependent language modeling

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Inferring social nature of conversations from words: Experiments on a corpus of everyday telephone conversations

Computer Speech and Language
The ontology lifecycle in RoboCup: population from text and execution

Robot Soccer World Cup XV
Detection of naming convention violations in process models for different languages

Decision Support Systems
Rule-based approach for handling of case markers in English to Urdu/Hindi translation

International Journal of Knowledge Engineering and Soft Data Paradigms
Creating patents on the new technology using analogy-based patent mining

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.