IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Tree Induction for Probability-Based Ranking
Machine Learning
Learning relational probability trees
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Using relational knowledge discovery to prevent securities fraud
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Finding tribes: identifying close-knit individuals from employment patterns
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Autocorrelation and linkage cause bias in evaluation of relational learners
ILP'02 Proceedings of the 12th international conference on Inductive logic programming
SNARE: a link analytic system for graph labeling and risk detection
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier Detection with Explanation Facility
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Outlier Detection with a Hybrid Artificial Intelligence Method
MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
Beyond prediction: directions for probabilistic and relational learning
ILP'07 Proceedings of the 17th international conference on Inductive logic programming
In-depth behavior understanding and use: The behavior informatics approach
Information Sciences: an International Journal
Indexing Network Structure with Shortest-Path Trees
ACM Transactions on Knowledge Discovery from Data (TKDD)
Enhanced spatiotemporal relational probability trees and forests
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers(NASD). We describe several methods for data pre-processing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innovative technique to infer professional associations by exploiting dynamic employment histories. Finally, we applied normalization techniques to create a suitable class label that adjusts for spatial, temporal, and other heterogeneity within the data. We show how these pre-processing techniques combine to provide the necessary foundation for learning high-performing statistical models of fraudulent activity.