Correlating multilingual documents via bipartite graph modeling

Authors:
Hongyuan Zha;Xiang Ji
Affiliations:
The Pennsylvania State University, University Park, PA;The Pennsylvania State University, University Park, PA
Venue:
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2002

Citing 1
Cited 6

Bipartite graph partitioning and data clustering

Proceedings of the tenth international conference on Information and knowledge management

Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Domain-independent text segmentation using anisotropic diffusion and dynamic programming

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Topic segmentation with shared topic detection and alignment of multiple documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Bipartite isoperimetric graph partitioning for data co-clustering

Data Mining and Knowledge Discovery
Extracting shared topics of multiple documents

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Towards bipartite graph data management

CloudDB '10 Proceedings of the second international workshop on Cloud data management

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is enormous amount of multilingual documents from various sources and possibly from different countries describing a single event or a set of related events. It is desirable to construct text mining methods that can compare and highlight similarities and differences of those multilingual documents. We discuss our ongoing research that seeks to model a pair of multilingual documents as a weighted bipartite graph with the edge weights computed by means of machine translation. We use spectral method to identify dense subgraphs of the weighted bipartite graph which can be considered as corresponding to sentences that correlate well in textual contents. We illustrate our approach using English and German texts.