Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Translingual information retrieval: learning from bilingual corpora
Artificial Intelligence - Special issue: artificial intelligence 40 years later
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
A systematic comparison of various statistical alignment models
Computational Linguistics
Resolving ambiguity for cross-language information retrieval: a dictionary approach
Resolving ambiguity for cross-language information retrieval: a dictionary approach
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Dictionary-based techniques for cross-language information retrieval
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Principles of hash-based text retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Strategies for retrieving plagiarized documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Wikipedia in the pocket: indexing technology for near-duplicate detection and high similarity search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Multilingual Plagiarism Detection
AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
A statistical approach to crosslingual natural language tasks
Journal of Algorithms
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Collection-Relative Representations: A Unifying View to Retrieval Models
DEXA '09 Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application
Using query-relevant documents pairs for cross-lingual information retrieval
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
A Wikipedia-based multilingual retrieval model
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Intrinsic plagiarism detection
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
On the mono- and cross-language detection of text reuse and plagiarism
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Plagiarism detection across distant language pairs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A new approach for cross-language plagiarism analysis
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Towards the detection of cross-language source code reuse
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Plagiarism detection based on structural information
Proceedings of the 20th ACM international conference on Information and knowledge management
Detection of near-duplicate user generated contents: the SMS spam collection
Proceedings of the 3rd international workshop on Search and mining user-generated contents
Exploiting Wikipedia for cross-lingual and multilingual information retrieval
Data & Knowledge Engineering
Finding translations in scanned book collections
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Text reuse with ACL: (upward) trends
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Cross-Language high similarity search using a conceptual thesaurus
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Cross-Language plagiarism detection using a multilingual semantic network
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Identifying useful human correction feedback from an on-line machine translation service
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection
Computational Linguistics
Hi-index | 0.00 |
Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (1) a comprehensive retrieval process for cross-language plagiarism detection is introduced, highlighting the differences to monolingual plagiarism detection, (2) state-of-the-art solutions for two important subtasks are reviewed, (3) retrieval models for the assessment of cross-language similarity are surveyed, and, (4) the three models CL-CNG, CL-ESA and CL-ASA are compared. Our evaluation is of realistic scale: it relies on 120,000 test documents which are selected from the corpora JRC-Acquis and Wikipedia, so that for each test document highly similar documents are available in all of the six languages English, German, Spanish, French, Dutch, and Polish. The models are employed in a series of ranking tasks, and more than 100 million similarities are computed with each model. The results of our evaluation indicate that CL-CNG, despite its simple approach, is the best choice to rank and compare texts across languages if they are syntactically related. CL-ESA almost matches the performance of CL-CNG, but on arbitrary pairs of languages. CL-ASA works best on "exact" translations but does not generalize well.