Discovering multilingual concepts from unaligned web documents by exploring associated images

Authors:
Xiaochen Zhang;Xiaoming Jin;Lianghao Li;Dou Shen
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Hong Kong University of Science and Technology, Hong Kong, China;Baidu, Beijing, China
Venue:
Proceedings of the 22nd international conference on World Wide Web companion
Year:
2013

Citing 3
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
Cross lingual text classification by mining multilingual topics from wikipedia

Proceedings of the fourth ACM international conference on Web search and data mining
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Internet is experiencing an explosion of information presented in different languages. Though written in different languages, some articles implicitly share common concepts. In this paper, we propose a novel framework to mine cross-language common concepts from unaligned web documents. Specifically, visual words of images are used to bridge articles in different languages and then common concepts of multiple languages are learned by using an existing topic modeling algorithm. We conduct cross-lingual text classification in a real-world data set using the mined multilingual concepts from our method. The experiment results show that our approach is effective to mine cross-lingual common concepts.