Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The Journal of Machine Learning Research
Identifying word translations in non-parallel texts
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Extraction of lexical translations from non-aligned corpora
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Cross-language text classification
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Mining comparable bilingual text corpora for cross-language information integration
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Processing comparable corpora with Bilingual Suffix Trees
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Combining bidirectional translation and synonymy for cross-language information retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Extracting parallel sub-sentential fragments from non-parallel corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Topic sentiment mixture: modeling facets and opinions in weblogs
Proceedings of the 16th international conference on World Wide Web
Proceedings of the 17th international conference on World Wide Web
Modeling online reviews with multi-grain topic models
Proceedings of the 17th international conference on World Wide Web
Can chinese web pages be classified with english data source?
Proceedings of the 17th international conference on World Wide Web
Enhancing text clustering by leveraging Wikipedia semantics
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Bilingual topic aspect classification with a few training examples
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Deriving a large scale taxonomy from Wikipedia
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A Wikipedia-based multilingual retrieval model
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
It's who you know: graph mining using recursive structural features
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Cross-lingual text classification with model translation and document translation
Proceedings of the 50th Annual Southeast Regional Conference
Exploiting Wikipedia for cross-lingual and multilingual information retrieval
Data & Knowledge Engineering
Towards building a multilingual semantic network: identifying interlingual links in Wikipedia
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Exploiting semantic annotations in math information retrieval
Proceedings of the fifth workshop on Exploiting semantic annotations in information retrieval
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Discovering multilingual concepts from unaligned web documents by exploring associated images
Proceedings of the 22nd international conference on World Wide Web companion
Cross-lingual web spam classification
Proceedings of the 22nd international conference on World Wide Web companion
Cross lingual entity linking with bilingual topic model
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Multilingual probabilistic topic modeling and its applications in web mining and search
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
This paper investigates how to effectively do cross lingual text classification by leveraging a large scale and multilingual knowledge base, Wikipedia. Based on the observation that each Wikipedia concept is described by documents of different languages, we adapt existing topic modeling algorithms for mining multilingual topics from this knowledge base. The extracted topics have multiple types of representations, with each type corresponding to one language. In this work, we regard such topics extracted from Wikipedia documents as universal-topics, since each topic corresponds with same semantic information of different languages. Thus new documents of different languages can be represented in a space using a group of universal-topics. We use these universal-topics to do cross lingual text classification. Given the training data labeled for one language, we can train a text classifier to classify the documents of another language by mapping all documents of both languages into the universal-topic space. This approach does not require any additional linguistic resources, like bilingual dictionaries, machine translation tools, or labeling data for the target language. The evaluation results indicate that our topic modeling approach is effective for building cross lingual text classifier.