Using unsupervised link discovery methods to find interesting facts and connections in a bibliography dataset

Authors:
Shou-de Lin;Hans Chalupsky
Affiliations:
University of Southern California;University of Southern California
Venue:
ACM SIGKDD Explorations Newsletter
Year:
2003

Citing 2
Cited 8

Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining

Using "Cited by" Information to Find the Context of Research Papers

PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
Using importance flooding to identify interesting networks of criminal activity

Journal of the American Society for Information Science and Technology
Interesting instance discovery in multi-relational data

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
The KOJAK group finder: connecting the dots via integrated knowledge-based and statistical reasoning

IAAI'04 Proceedings of the 16th conference on Innovative applications of artifical intelligence
Derived types in semantic association discovery

Journal of Intelligent Information Systems
A framework for relational link discovery

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Using importance flooding to identify interesting networks of criminal activity

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Can intermediary-based science standards crosswalking work? Some evidence from mining the standard alignment tool (SAT)

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a submission to the Open Task of the 2003 KDD Cup. For this task contestants were asked to devise their own questions about the HEP-Th bibliography dataset, and the most interesting result would be selected as the winner. Instead of taking a more traditional approach such as starting with a inspection of the data, formulating questions or hypotheses interesting to us and then devising an analysis and approach to answer these questions, we tried to go a different route: can we develop a program that automatically finds interesting facts and connections in the data?To do this we developed a set of unsupervised link discovery methods that compute interestingness based on a notion of "rarity" and "abnormality". The experiments performed on the HEP-Th dataset show that our approaches are able to automatically uncover interesting hidden connections (e.g. significant relationships between people) and unexpected facts (e.g. citation loops) without the support of any prerequisite knowledge or training examples. The interestingness of some of our results is self-evident. For others we were able to verify them by looking for supporting evidence on the World-Wide-Web, which shows that our methods can find connections between entities that actually are interestingly connected in the real world in an unsupervised way.