Theoretical Computer Science
Complete inverted files for efficient text retrieval and analysis
Journal of the ACM (JACM)
DL '97 Proceedings of the second ACM international conference on Digital libraries
A design principles of a weighted finite-state transducer library
Theoretical Computer Science - Special issue on implementing automata
Advances in phonetic word spotting
Proceedings of the tenth international conference on Information and knowledge management
Semiring frameworks and algorithms for shortest-distance problems
Journal of Automata, Languages and Combinatorics
Finite-state transducers in language and speech processing
Computational Linguistics
Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Searching the audio notebook: keyword search in recorded conversations
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Vocabulary independent spoken term detection
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Web derived pronunciations for spoken term detection
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Speech retrieval in unknown languages: a pilot study
CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
General suffix automaton construction algorithm and space bounds
Theoretical Computer Science
Score distribution based term specific thresholding for spoken term detection
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Statistical lattice-based spoken document retrieval
ACM Transactions on Information Systems (TOIS)
Factor automata of automata and applications
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Efficient and robust music identification with weighted finite-state transducers
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Query-driven strategy for on-the-fly term spotting in spontaneous speech
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on scalable audio-content analysis
Two-stream indexing for spoken web search
Proceedings of the 20th international conference companion on World wide web
Probabilistic management of OCR data using an RDBMS
Proceedings of the VLDB Endowment
ACM Transactions on Information Systems (TOIS)
An approach for efficient open vocabulary spoken term detection
Speech Communication
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
Much of the massive quantities of digitized data widely available, e.g., text, speech, hand-written sequences, are either given directly, or, as a result of some prior processing, as weighted automata. These are compact representations of a large number of alternative sequences and their weights reflecting the uncertainty or variability of the data. Thus, the indexation of such data requires indexing weighted automata. We present a general algorithm for the indexation of weighted automata. The resulting index is represented by a deterministic weighted transducer that is optimal for search: the search for an input string takes time linear in the sum of the size of that string and the number of indices of the weighted automata where it appears. We also introduce a general framework based on weighted transducers that generalizes this indexation to enable the search for more complex patterns including syntactic information or for different types of sequences, e.g., word sequences instead of phonemic sequences. The use of this framework is illustrated with several examples. We applied our general indexation algorithm and framework to the problem of indexation of speech utterances and report the results of our experiments in several tasks demonstrating that our techniques yield comparable results to previous methods, while providing greater generality, including the possibility of searching for arbitrary patterns represented by weighted automata.