Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Multilingual document clustering: an heuristic approach based on cognate named entities
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Multilingual news clustering: Feature translation vs. identification of cognate named entities
Pattern Recognition Letters
CLBCRA-Approach for Combination of Content-Based and Link-Based Ranking in Web Search
ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Feature-based method for document alignment in comparable news corpora
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Correlation clustering for crosslingual link detection
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Part of Speech (POS) Tag Sets Reduction and Analysis Using Rough Set Techniques
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Bilingual news clustering using named entities and fuzzy similarity
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Plagiarism detection across distant language pairs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
IEEE Transactions on Fuzzy Systems
An event-centric model for multilingual document similarity
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Multilingual document clustering using wikipedia as external knowledge
IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Effectively mining wikipedia for clustering multilingual documents
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Optimizing personalized retrieval system based on web ranking
CSR'06 Proceedings of the First international computer science conference on Theory and Applications
Multilingual news document clustering: two algorithms based on cognate named entities
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Supporting collaboration in Wikipedia between language communities
Proceedings of the 4th international conference on Intercultural Collaboration
ACCURAT toolkit for multi-level alignment and information extraction from comparable corpora
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Cross-Language high similarity search using a conceptual thesaurus
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Hi-index | 0.01 |
We are presenting an approach to calculating the semantic similarity of documents written in the same or in different languages. The similarity calculation is achieved by representing the document contents in a language-independent way, using the descriptor terms of the multilingual thesaurus EUROVOC, and by then calculating the distance between these representations. While EUROVOC is a carefully handcrafted knowledge structure, our procedure uses statistical techniques. The method was applied to a collection of 5990 English and Spanish parallel texts and evaluated by measuring the number of times the translation of a given document was identified as the most similar document. The good results showed the feasibility and usefulness of the approach.