Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms
International Journal of Man-Machine Studies - Special issue: symbolic problem solving in noisy and novel task environments
IEEE Transactions on Pattern Analysis and Machine Intelligence
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications
Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms
Data Mining and Knowledge Discovery
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
On the Consistency of Information Filters for Lazy Learning Algorithms
PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Evaluation of Techniques for Classifying Biological Sequences
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Similarity search in multimedia time series data using amplitude-level features
MMM'08 Proceedings of the 14th international conference on Advances in multimedia modeling
Hi-index | 0.00 |
The amount of stored information in modern database applications increased tremendously in recent years. Besides their sheer amount, the stored data objects are also more and more complex. Therefore, classification of these complex objects is an important data mining task that yields several new challenges. In many applications, the data objects provide multiple representations. E.g. proteins can be described by text, amino acid sequences or 3D structures. Additionally, many real-world applications need to distinguish thousands of classes. Last but not least, many complex objects are not directly expressible by feature vectors. To cope with all these requirements, we introduce a novel approach to classification of multi-represented objects that is capable to distinguish large numbers of classes. Our method is based on k nearest neighbor classification and employs density-based clustering as a new approach to reduce the training instances for instance-based classification. To predict the most likely class, our classifier employs a new method to use several object representations for making accurate class predictions. The introduced method is evaluated by classifying proteins according to the classes of Gene Ontology, one of the most established class systems for biomolecules that comprises several thousand classes.