Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Level search schemes for information filtering and retrieval
Information Processing and Management: an International Journal
Detection of Invalid Routing Announcement in the Internet
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Novelty detection: a review—part 1: statistical approaches
Signal Processing
Automatic word sense discrimination
Computational Linguistics - Special issue on word sense disambiguation
Choosing the word most typical in context using a lexical co-occurrence network
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Why collective inference improves relational classification
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
IEEE Security and Privacy
Dependency Networks for Relational Data
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Learning-based anomaly detection in BGP updates
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
An internet routing forensics framework for discovering rules of abnormal BGP events
ACM SIGCOMM Computer Communication Review
Introduction to the special issue on link mining
ACM SIGKDD Explorations Newsletter
A framework for understanding latent semantic indexing (LSI) performance
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Graph-based text classification: learn from your neighbors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Detection of Interdomain Routing Anomalies Based on Higher-Order Path Analysis
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Discriminative probabilistic models for relational data
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Using clustering to detect Chinese censorware
Proceedings of the Seventh Annual Workshop on Cyber Security and Information Intelligence Research
Hi-index | 0.01 |
Labeled Data is scarce. Most statistical machine learning techniques rely on the availability of a large labeled corpus for building robust models for prediction and classification. In this paper we present a Higher Order Collective Classifier (HOCC) based on Higher Order Learning, a statistical machine learning technique that leverages latent information present in co-occurrences of items across records. These techniques violate the IID assumption that underlies most statistical machine learning techniques and have in prior work outperformed first order techniques in the presence of very limited data. We present results of applying HOCC to two different network data sets, first for detection and classification of anomalies in a Border Gateway Protocol dataset and second for building models of users from Network File System calls to perform masquerade detection. The precision of our system has been shown to be 30% better than the standard Naive Bayes technique for masquerade detection. These results indicate that HOCC can successfully model a variety of network events and can be applied to solve difficult problems in security using the general framework proposed.