NLP on spoken documents without ASR

Authors:
Mark Dredze;Aren Jansen;Glen Coppersmith;Ken Church
Affiliations:
Johns Hopkins University;Johns Hopkins University;Johns Hopkins University;Johns Hopkins University
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 12
Cited 2

A maximum entropy approach to natural language processing

Computational Linguistics
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
A comparison of extrinsic clustering evaluation metrics based on formal constraints

Information Retrieval
Phoneme recognition using spectral envelope and modulation frequency features

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Multi-class confidence weighted algorithms

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Unsupervised Pattern Discovery in Speech

IEEE Transactions on Audio, Speech, and Language Processing

Fuzzy combinations of criteria: an application to web page representation for clustering

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Query by babbling: a research agenda

Proceedings of the first workshop on Information and knowledge management for developing region

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural language processing, text classification and information retrieval. Many of these boxes, especially ASR, are often based on considerable linguistic resources. We would like to be able to process spoken documents with few (if any) resources. Moreover, connecting black boxes in series tends to multiply errors, especially when the key terms are out-of-vocabulary (OOV). The proposed alternative applies text processing directly to the speech without a dependency on ASR. The method finds long (~ 1 sec) repetitions in speech, and clusters them into pseudo-terms (roughly phrases). Document clustering and classification work surprisingly well on pseudo-terms; performance on a Switchboard task approaches a baseline using gold standard manual transcriptions.