From the old to the new: intergrating hypertext into traditional scholarship
HYPERTEXT '87 Proceedings of the ACM conference on Hypertext
Harvesting translingual vocabulary mappings for multilingual digital libraries
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Fast and Accurate Sentence Alignment of Bilingual Corpora
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A systematic comparison of various statistical alignment models
Computational Linguistics
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Translating unknown cross-lingual queries in digital libraries using a web-based approach
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Bootstrapping parsers via syntactic projection across parallel texts
Natural Language Engineering
An evaluation exercise for word alignment
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Cross-lingual propagation for morphological analysis
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Parallel implementations of word alignment tool
SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
(Meta-) evaluation of machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Named entity disambiguation by leveraging wikipedia semantic knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
Comparison, selection and use of sentence alignment algorithms for new language pairs
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Cross-lingual annotation projection of semantic roles
Journal of Artificial Intelligence Research
Matching multi-lingual subject vocabularies
ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Beyond digital incunabula: modeling the next generation of digital libraries
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Experiments in cross-language morphological annotation transfer
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Roadmap for multilingual information access in the European library
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Measuring historical word sense variation
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Student researchers, citizen scholars and the trillion word library
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Hi-index | 0.00 |
We present here a method for automatically projecting structural information across translations, including canonical citation structure (such as chapters and sections), speaker information, quotations, markup for people and places, and any other element in TEI-compliant XML that delimits spans of text that are linguistically symmetrical in two languages. We evaluate this technique on two datasets, one containing perfectly transcribed texts and one containing errorful OCR, and achieve an accuracy rate of 88.2% projecting 13,023 XML tags from source documents to their transcribed translations, with an 83.6% accuracy rate when projecting to texts containing uncorrected OCR. This approach has the potential to allow a highly granular multilingual digital library to be bootstrapped by applying the knowledge contained in a small, heavily curated collection to a much larger but unstructured one.