Unsupervised concept annotation using latent Dirichlet allocation and segmental methods

Authors:
Nathalie Camelin;Boris Detienne;Stéphane Huet;Dominique Quadri;Fabrice Lefèvre
Affiliations:
LIA - University of Avignon, BP, Avignon Cedex, France;LIA - University of Avignon, BP, Avignon Cedex, France;LIA - University of Avignon, BP, Avignon Cedex, France;LIA - University of Avignon, BP, Avignon Cedex, France;LIA - University of Avignon, BP, Avignon Cedex, France
Venue:
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Year:
2011

Citing 8
Cited 0

A systematic comparison of various statistical alignment models

Computational Linguistics
Latent dirichlet allocation

The Journal of Machine Learning Research
Understanding spontaneous speech: the Phoenix system

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Spoken language understanding from unaligned data using discriminative classification models

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
LDA based similarity modeling for question answering

SS '10 Proceedings of the NAACL HLT 2010 Workshop on Semantic Search
Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Training efficient statistical approaches for natural language understanding generally requires data with segmental semantic annotations. Unfortunately, building such resources is costly. In this paper, we propose an approach that produces annotations in an unsupervised way. The first step is an implementation of latent Dirichlet allocation that produces a set of topics with probabilities for each topic to be associated with a word in a sentence. This knowledge is then used as a bootstrap to infer a segmentation of a word sentence into topics using either integer linear optimisation or stochastic word alignment models (IBM models) to produce the final semantic annotation. The relation between automatically-derived topics and task-dependent concepts is evaluated on a spoken dialogue task with an available reference annotation.