A vector space model for automatic indexing
Communications of the ACM
Text classification using string kernels
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Bioinformatics
Linear-Time Computation of Similarity Measures for Sequential Data
The Journal of Machine Learning Research
Securing IMS against novel threats
Bell Labs Technical Journal - General Papers
Hash Kernels for Structured Data
The Journal of Machine Learning Research
Automatic analysis of malware behavior using machine learning
Journal of Computer Security
Smart metering de-pseudonymization
Proceedings of the 27th Annual Computer Security Applications Conference
Hi-index | 0.00 |
Strings and sequences are ubiquitous in many areas of data analysis. However, only few learning methods can be directly applied to this form of data. We present Sally, a tool for embedding strings in vector spaces that allows for applying a wide range of learning methods to string data. Sally implements a generalized form of the bag-of-words model, where strings are mapped to a vector space that is spanned by a set of string features, such as words or n-grams of words. The implementation of Sally builds on efficient string algorithms and enables processing millions of strings and features. The tool supports several data formats and is capable of interfacing with common learning environments, such as Weka, Shogun, Matlab, or Pylab. Sally has been successfully applied for learning with natural language text, DNA sequences and monitored program behavior.