Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Using "Cited by" Information to Find the Context of Research Papers
PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
Using importance flooding to identify interesting networks of criminal activity
Journal of the American Society for Information Science and Technology
Interesting instance discovery in multi-relational data
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
The KOJAK group finder: connecting the dots via integrated knowledge-based and statistical reasoning
IAAI'04 Proceedings of the 16th conference on Innovative applications of artifical intelligence
Derived types in semantic association discovery
Journal of Intelligent Information Systems
A framework for relational link discovery
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Using importance flooding to identify interesting networks of criminal activity
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
This paper describes a submission to the Open Task of the 2003 KDD Cup. For this task contestants were asked to devise their own questions about the HEP-Th bibliography dataset, and the most interesting result would be selected as the winner. Instead of taking a more traditional approach such as starting with a inspection of the data, formulating questions or hypotheses interesting to us and then devising an analysis and approach to answer these questions, we tried to go a different route: can we develop a program that automatically finds interesting facts and connections in the data?To do this we developed a set of unsupervised link discovery methods that compute interestingness based on a notion of "rarity" and "abnormality". The experiments performed on the HEP-Th dataset show that our approaches are able to automatically uncover interesting hidden connections (e.g. significant relationships between people) and unexpected facts (e.g. citation loops) without the support of any prerequisite knowledge or training examples. The interestingness of some of our results is self-evident. For others we were able to verify them by looking for supporting evidence on the World-Wide-Web, which shows that our methods can find connections between entities that actually are interestingly connected in the real world in an unsupervised way.