Algorithms for storytelling

Authors:
Deept Kumar;Naren Ramakrishnan;Richard F. Helm;Malcolm Potts
Affiliations:
Virginia Tech, Blacksburg, VA;Virginia Tech, Blacksburg, VA;Virginia Tech, Blacksburg, VA;Virginia Tech, Blacksburg, VA
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 11
Cited 3

An interactive system for finding complementary literatures: a stimulus to scientific discovery

Artificial Intelligence - Special issue on scientific discovery
A new method for similarity indexing of market basket data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient similarity search for market basket data

The VLDB Journal — The International Journal on Very Large Data Bases
Link mining: a new data mining challenge

ACM SIGKDD Explorations Newsletter
Biological storytelling: a software tool for biological information organization based upon narrative structure

ACM SIGGROUP Bulletin
Efficient set joins on similarity predicates

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Turning CARTwheels: an alternating algorithm for mining redescriptions

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Unweaving a web of documents

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Redescription mining: structure theory and algorithms

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Cached sufficient statistics for efficient machine learning with large datasets

Journal of Artificial Intelligence Research
An axiomatization of partition entropy

IEEE Transactions on Information Theory

Compositional mining of multirelational biological datasets

ACM Transactions on Knowledge Discovery from Data (TKDD)
Connecting Two (or Less) Dots: Discovering Structure in News Articles

ACM Transactions on Knowledge Discovery from Data (TKDD)
"Metro maps of information" by Dafna Shahaf, Carlos Guestrin and Eric Horvitz, with Ching-man Au Yeung as coordinator

ACM SIGWEB Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

We formulate a new data mining problem called it storytelling as a generalization of redescription mining. In traditional redescription mining, we are given a set of objects and a collection of subsets defined over these objects. The goal is to view the set system as a vocabulary and identify two expressions in this vocabulary that induce the same set of objects. Storytelling, on the other hand, aims to explicitly relate object sets that are disjoint (and hence, maximally dissimilar) by finding a chain of (approximate) redescriptions between the sets. This problem finds applications in bioinformatics, for instance, where the biologist is trying to relate a set of genes expressed in one experiment to another set, implicated in a different pathway. We outline an efficient storytelling implementation that embeds the CART wheels redescription mining algorithm in an A* search procedure, using the former to supply next move operators on search branches to the latter. This approach is practical and effective for mining large datasets and, at the same time, exploits the structure of partitions imposed by the given vocabulary. Three application case studies are presented: a study of word overlaps in large English dictionaries, exploring connections between genesets in a bioinformatics dataset, and relating publications in the PubMed index of abstracts.