Identification of spoken questions using similarity-based TF·AoI

Authors:
Yasutomo Kimura;Kenji Araki;Koji Tochinai
Affiliations:
Graduate School of Engineering, Hokkaido University, Sapporo, 060-8628 Japan;Graduate School of Information Science and Technology, Hokkaido University, Sapporo, 060-0814 Japan;Graduate School of Business Administration, Hokkai-Gakuen University, Sapporo, 062-8625 Japan
Venue:
Systems and Computers in Japan
Year:
2007

Citing 0
Cited 1

A new benchmark dataset with production methodology for short text semantic similarity algorithms

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity is utilized in the retrieval and extraction of information, but it can also be used in dialog processing. Spoken dialog processing must deal with speech recognition error, interjections and noise, and it is rare that the same expressions are used consistently. It is required to find a sentence which is similar to the input sentence while taking account of these phenomena. This paper proposes an identification method for the question sentence based on TF·AoI (term frequency×amount of information) weighting. In this method, the words contained in the input sentence are weighted by (word similarity)×(amount of information). Then, based on the calculated Euclidean distance, the response corresponding to the question with the highest similarity is output. Comparison experiments verify an improvement of 13 points over the method of comparison by matching ratio to the input sentence, and by 6.5 points over the method of “similarity by TF·AoI weighting.” © 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(10): 81– 94, 2007; Published online in Wiley InterScience (). DOI 10.1002/scj.20363