Software—Practice & Experience
Text compression
A statistical approach to machine translation
Computational Linguistics
Elements of information theory
Elements of information theory
Information Processing and Management: an International Journal - Special issue on data compression for images and texts
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Compact pat trees
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A fast string searching algorithm
Communications of the ACM
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
A systematic comparison of various statistical alignment models
Computational Linguistics
DCC '02 Proceedings of the Data Compression Conference
Empirical methods for exploiting parallel texts
Empirical methods for exploiting parallel texts
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Introduction to the special issue on word sense disambiguation: the state of the art
Computational Linguistics - Special issue on word sense disambiguation
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval)
Natural Language Engineering
Machine Translation with Inferred Stochastic Finite-State Transducers
Computational Linguistics
Translation spotting for translation memories
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Data Compression: The Complete Reference
Data Compression: The Complete Reference
Lightweight natural language text compression
Information Retrieval
ACM Computing Surveys (CSUR)
Scaling phrase-based statistical machine translation to larger corpora and longer phrases
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
N-gram-based Machine Translation
Computational Linguistics
Statistical machine translation
ACM Computing Surveys (CSUR)
On the Use of Word Alignments to Enhance Bitext Compression
DCC '09 Proceedings of the 2009 Data Compression Conference
Triplet lexicon models for statistical machine translation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A Two-Level Structure for Compressing Aligned Bitexts
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Directly Addressable Variable-Length Codes
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Bilingual concordancers and translation memories: a comparative evaluation
LRTWRT '04 Proceedings of the Second International Workshop on Language Resources for Translation Work, Research and Training
Word-based text compression using the Burrows-Wheeler transform
Information Processing and Management: an International Journal
Statistical Machine Translation
Statistical Machine Translation
Modelling Parallel Texts for Boosting Compression
DCC '10 Proceedings of the 2010 Data Compression Conference
TransSearch: from a bilingual concordancer to a translation finder
Machine Translation
Mapping words into codewords on PPM
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Wider context by using bilingual language models in machine translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Generalized biwords for bitext compression and translation spotting: extended abstract
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Large bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is exploited. For example, a bitext can be seen as a sequence of biwords --pairs of parallel words with a high probability of cooccurrence-- that can be used as an intermediate representation in the compression process. However, the simple biword approach described in the literature can only exploit one-to-one word alignments and cannot tackle the reordering of words. We therefore introduce a generalization of biwords which can describe multi-word expressions and reorderings. We also describe some methods for the binary compression of generalized biword sequences, and compare their performance when different schemes are applied to the extraction of the biword sequence. In addition, we show that this generalization of biwords allows for the implementation of an efficient algorithm to look on the compressed bitext for words or text segments in one of the texts and retrieve their counterpart translations in the other text --an application usually referred to as translation spotting-- with only some minor modifications in the compression algorithm.