A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
A trainable rule-based algorithm for word segmentation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
User-assisted ink-bleed correction for handwritten documents
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
BinarizationShop: a user-assisted software suite for converting old documents to black-and-white
Proceedings of the 10th annual joint conference on Digital libraries
Hi-index | 0.01 |
The goal of the Lester S. Levy Sheet Music Collection, Phase Two project is to develop tools, processes, and systems that facilitate collection ingestion through automated processes that reduce, but not necessarily eliminate human intervention[1]. One of the major components of this project is an optical music recognition (OMR) system[2] that extracts musical information and lyric text from the page images that comprise each piece in a collection. It is often the case, as it is with the Levy Collection, that lyrics embedded in music notation are written in a syllabicated form so that each syllable lines up with the note or notes to which it corresponds. Searching the syllabicated form of words, however, would be counterintuitive and cumbersome for end-users. This paper describes the evolution of a tool that, using a simple algorithm, rebuilds complete words from lyric syllables and, in ambiguous cases, provides feedback to the collection builder. This system will be integrated into the workflow of the Levy Sheet Music Collection, but has broad applicability for any project ingesting musical scores with lyrics.