Annotation and verification of sense pools in OntoNotes

Authors:
Liang-Chih Yu;Chung-Hsien Wu;Ru-Yng Chang;Chao-Hong Liu;Eduard Hovy
Affiliations:
Department of Information Management, Yuan-Ze University, No. 135, Yuan-Tung Road, Chung-Li 32030, Taiwan, ROC;Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, Ta-Hsueh Road, Tainan, Taiwan, ROC;Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, Ta-Hsueh Road, Tainan, Taiwan, ROC;Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, Ta-Hsueh Road, Tainan, Taiwan, ROC;Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA 90292, United States
Venue:
Information Processing and Management: an International Journal
Year:
2010

Citing 28
Cited 3

An approach to the automatic construction of global thesauri

Information Processing and Management: an International Journal
Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
An empirical study of automated dictionary construction for information extraction in three domains

Artificial Intelligence - Special volume on empirical methods
Information retrieval and artificial intelligence

Artificial Intelligence - Special issue on applications of artificial intelligence
Query expansion using heterogeneous thesauri

Information Processing and Management: an International Journal
Towards a standard upper ontology

Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
Sweetening Ontologies with DOLCE

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Automatic thesaurus generation for Chinese documents

Journal of the American Society for Information Science and Technology
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
An effective approach to document retrieval via utilizing WordNet and recognizing phrases

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Building parallel corpora by automatic title alignment using length-based and text-based approaches

Information Processing and Management: an International Journal
Tree-cut and a lexicon based on systematic polysemy

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A semantic concordance

HLT '93 Proceedings of the workshop on Human Language Technology
Acquiring collocations for lexical choice between near-synonyms

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Building a sense tagged corpus with open mind word expert

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Hownet And the Computation of Meaning

Hownet And the Computation of Meaning
Meaningful clustering of senses helps boost word sense disambiguation performance

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A review of ontology based query expansion

Information Processing and Management: an International Journal
Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion

Information Processing and Management: an International Journal
OntoNotes: A Unified Relational Semantic Representation

ICSC '07 Proceedings of the International Conference on Semantic Computing
Ontology-based speech act identification in a bilingual dialog system using partial pattern trees

Journal of the American Society for Information Science and Technology
OntoNotes: corpus cleanup of mistaken agreement using word sense disambiguation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Reformulation of queries using similarity thesauri

Information Processing and Management: an International Journal
HAL-Based Evolutionary Inference for Pattern Induction From Psychiatry Web Resources

IEEE Transactions on Evolutionary Computation
Psychiatric Consultation Record Retrieval Using Scenario-Based Representation and Multilevel Mixture Model

IEEE Transactions on Information Technology in Biomedicine

Mining association language patterns using a distributional semantic model for negative life event classification

Journal of Biomedical Informatics
Using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news

Knowledge-Based Systems
Independent component analysis for near-synonym choice

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper describes the OntoNotes, a multilingual (English, Chinese and Arabic) corpus with large-scale semantic annotations, including predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Although a sense pool provides a set of near-synonymous senses of words, there is still no knowledge about whether two words in a pool are interchangeable in practical use. Therefore, this paper devises an unsupervised algorithm that incorporates Google n-grams and a statistical test to determine whether a word in a pool can be substituted by other words in the same pool. The n-gram features are used to measure the degree of context mismatch for a substitution. The statistical test is then applied to determine whether the substitution is adequate based on the degree of mismatch. The proposed method is compared with a supervised method, namely Linear Discriminant Analysis (LDA). Experimental results show that the proposed unsupervised method can achieve comparable performance with the supervised method.