Multi-represented kNN-classification for large class sets

Authors:
Hans-Peter Kriegel;Alexey Pryakhin;Matthias Schubert
Affiliations:
Institute for Computer Science, University of Munich, Munich, Germany;Institute for Computer Science, University of Munich, Munich, Germany;Institute for Computer Science, University of Munich, Munich, Germany
Venue:
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Year:
2005

Citing 11
Cited 1

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms

International Journal of Man-Machine Studies - Special issue: symbolic problem solving in noisy and novel task environments
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
Instance Pruning Techniques

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
On the Consistency of Information Filters for Lazy Learning Algorithms

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Evaluation of Techniques for Classifying Biological Sequences

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Similarity search in multimedia time series data using amplitude-level features

MMM'08 Proceedings of the 14th international conference on Advances in multimedia modeling

Quantified Score

Hi-index	0.00

Visualization

Abstract

The amount of stored information in modern database applications increased tremendously in recent years. Besides their sheer amount, the stored data objects are also more and more complex. Therefore, classification of these complex objects is an important data mining task that yields several new challenges. In many applications, the data objects provide multiple representations. E.g. proteins can be described by text, amino acid sequences or 3D structures. Additionally, many real-world applications need to distinguish thousands of classes. Last but not least, many complex objects are not directly expressible by feature vectors. To cope with all these requirements, we introduce a novel approach to classification of multi-represented objects that is capable to distinguish large numbers of classes. Our method is based on k nearest neighbor classification and employs density-based clustering as a new approach to reduce the training instances for instance-based classification. To predict the most likely class, our classifier employs a new method to use several object representations for making accurate class predictions. The introduced method is evaluated by classifying proteins according to the classes of Gene Ontology, one of the most established class systems for biomolecules that comprises several thousand classes.