A cross-collection mixture model for comparative text mining

Authors:
ChengXiang Zhai;Atulya Velivelli;Bei Yu
Affiliations:
University of Illinois at Urbana Champaign;University of Illinois at Urbana Champaign;University of Illinois at Urbana Champaign
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 9
Cited 74

Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Towards multidocument summarization by reformulation: progress and prospects

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The Interspace: Concept Navigation Across Distributed Communities

Computer
Latent dirichlet allocation

The Journal of Machine Learning Research
Coupled clustering: a method for detecting structural correspondence

The Journal of Machine Learning Research
Cross-training: learning probabilistic mappings between topics

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining comparable bilingual text corpora for cross-language information integration

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
CWS: a comparative web search system

Proceedings of the 15th international conference on World Wide Web
A probabilistic approach to spatiotemporal theme pattern mining on weblogs

Proceedings of the 15th international conference on World Wide Web
Automatic new topic identification using multiple linear regression

Information Processing and Management: an International Journal
Identifying comparative sentences in text documents

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A mixture model for contextual text mining

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic sentiment mixture: modeling facets and opinions in weblogs

Proceedings of the 16th international conference on World Wide Web
Organizing the OCA: learning faceted subjects from a library of digital books

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Term feedback for information retrieval with language models

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic labeling of multinomial topic models

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining correlated bursty topic patterns from coordinated text streams

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Generating gene summaries from biomedical literature: A study of semi-structured summarization

Information Processing and Management: an International Journal
/*icomment: bugs or bad comments?*/

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Improve retrieval accuracy for difficult queries using negative feedback

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Topic modeling with network regularization

Proceedings of the 17th international conference on World Wide Web
Modeling online reviews with multi-grain topic models

Proceedings of the 17th international conference on World Wide Web
Opinion integration through semi-supervised topic modeling

Proceedings of the 17th international conference on World Wide Web
Mining multi-faceted overviews of arbitrary topics in a text collection

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Fuzzy Clustering for Topic Analysis and Summarization of Document Collections

CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Timeline Analysis of Web News Events

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Learning to Identify Comparative Sentences in Chinese Text

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Mining common topics from multiple asynchronous text streams

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Statistical Language Models for Information Retrieval A Critical Review

Foundations and Trends in Information Retrieval
Rated aspect summarization of short comments

Proceedings of the 18th international conference on World wide web
A sentence level probabilistic model for evolutionary theme pattern mining from news corpora

Proceedings of the 2009 ACM symposium on Applied Computing
Ranking-based clustering of heterogeneous information networks with star network schema

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Generating comparative summaries of contradictory opinions in text

Proceedings of the 18th ACM conference on Information and knowledge management
Comparative document summarization via discriminative sentence selection

Proceedings of the 18th ACM conference on Information and knowledge management
Graph clustering based on structural/attribute similarities

Proceedings of the VLDB Endowment
Finding Comparative Facts and Aspects for Judging the Credibility of Uncertain Facts

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Cross-cultural analysis of blogs and forums with mixed-collection topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
A mixture model for expert finding

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
PET: a statistical model for popular events tracking in social communities

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
From bursty patterns to bursty facts: The effectiveness of temporal text mining for news

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
OpinionIt: a text mining system for cross-lingual opinion analysis

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Identifying new categories in community question answering archives: a topic modeling approach

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Topic detection by topic model induced distance using biased initiation

AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Content-aware resolution sequence mining for ticket routing

BPM'10 Proceedings of the 8th international conference on Business process management
Clustering Large Attributed Graphs: A Balance between Structural and Attribute Similarities

ACM Transactions on Knowledge Discovery from Data (TKDD)
Bridging topic modeling and personalized search

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Geographical topic discovery and comparison

Proceedings of the 20th international conference on World wide web
The web of topics: discovering the topology of topic evolution in a corpus

Proceedings of the 20th international conference on World wide web
An analysis of perspectives in interactive settings

Proceedings of the First Workshop on Social Media Analytics
Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA

Information Retrieval
Modeling reciprocity in social interactions with probabilistic latent space models

Natural Language Engineering
On summarizing graph homogeneously

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Comparing twitter and traditional media using topic models

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Structural topic model for latent topical structure analysis

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Comparative news summarization using linear programming

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
A game theoretic framework for heterogenous information network clustering

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A time-dependent topic model for multiple text streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Content-driven trust propagation framework

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Web text clustering with dynamic themes

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Discovering intermediate entities from two examples by using web search engine indices

Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
Mining contrastive opinions on political texts using cross-perspective topic model

Proceedings of the fifth ACM international conference on Web search and data mining
Find me opinion sources in blogosphere: a unified framework for opinionated blog feed retrieval

Proceedings of the fifth ACM international conference on Web search and data mining
Analyzing document collections via context-aware term extraction

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Latent Community Topic Analysis: Integration of Community Discovery with Topic Modeling

ACM Transactions on Intelligent Systems and Technology (TIST)
Perturbation of Matrices and Nonnegative Rank with a View toward Statistical Models

SIAM Journal on Matrix Analysis and Applications
Group matrix factorization for scalable topic modeling

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Unsupervised and supervised learning to evaluate event relatedness based on content mining from social-media streams

Expert Systems with Applications: An International Journal
Comparative document summarization via discriminative sentence selection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Supervised cross-collection topic modeling

Proceedings of the 20th ACM international conference on Multimedia
Joint topic modeling for event summarization across news and social media streams

Proceedings of the 21st ACM international conference on Information and knowledge management
Learning to find comparable entities on the web

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
BiCWS: mining cognitive differences from bilingual web search results

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Comparative Document Summarization via Discriminative Sentence Selection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Blog topic analysis using TF smoothing and LDA

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Monitoring User Evolution in Twitter

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Exploiting Forum Thread Structures to Improve Thread Clustering

Proceedings of the 2013 Conference on the Theory of Information Retrieval
A partially supervised cross-collection topic model for cross-domain text classification

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Story graphs: Tracking document set evolution using dynamic graphs

Intelligent Data Analysis - Dynamic Networks and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we define and study a novel text mining problem, which we refer to as Comparative Text Mining (CTM). Given a set of comparable text collections, the task of comparative text mining is to discover any latent common themes across all collections as well as summarize the similarity and differences of these collections along each common theme. This general problem subsumes many interesting applications, including business intelligence and opinion summarization. We propose a generative probabilistic mixture model for comparative text mining. The model simultaneously performs cross-collection clustering and within-collection clustering, and can be applied to an arbitrary set of comparable text collections. The model can be estimated efficiently using the Expectation-Maximization (EM) algorithm. We evaluate the model on two different text data sets (i.e., a news article data set and a laptop review data set), and compare it with a baseline clustering method also based on a mixture model. Experiment results show that the model is quite effective in discovering the latent common themes across collections and performs significantly better than our baseline mixture model.