Automatic document organization in a p2p environment

Authors:
Stefan Siersdorfer;Sergej Sizov
Affiliations:
Max-Planck Institute for Computer Science;Max-Planck Institute for Computer Science
Venue:
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Year:
2006

Citing 20
Cited 13

Epidemic algorithms for replicated database maintenance

PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Evaluating text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
Original Contribution: Stacked generalization

Neural Networks
Bagging predictors

Machine Learning
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
An adaptive version of the boost by majority algorithm

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Distributed clustering using collective principal component analysis

Knowledge and Information Systems
Modern Information Retrieval

Modern Information Retrieval
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Heterogeneous Learner for Web Page Classification

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An extensible meta-learning approach for scalable and accurate inductive learning

An extensible meta-learning approach for scalable and accurate inductive learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Privacy-preserving Distributed Clustering using Generative Models

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Restrictive clustering and metaclustering for self-organizing document collections

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Goal-oriented methods and meta methods for document classification and their parameter tuning

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Algorithms for clustering high dimensional and distributed data

Intelligent Data Analysis
The weighted majority algorithm

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Reliable multicast and its probabilistic model for job submission in peer-to-peer grids

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering

The database research group at the Max-Planck Institute for Informatics

ACM SIGMOD Record
Distributed classification in peer-to-peer networks

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Meta methods for model sharing in personal information systems

ACM Transactions on Information Systems (TOIS)
Cascade RSVM in Peer-to-Peer Networks

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Phone-to-phone communication for adaptive image classification

Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia
Communication-Efficient Classification in P2P Networks

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
A probabilistic model for compact document topic representation

SMO'09 Proceedings of the 9th WSEAS international conference on Simulation, modelling and optimization
Asynchronous peer-to-peer data mining with stochastic gradient descent

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Adaptive ensemble classification in p2p networks

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Satrap: data and network heterogeneity aware P2P data-mining

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Progress in information retrieval

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Peer-to-peer multi-class boosting

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Classification in P2P networks with cascade support vector machines

ACM Transactions on Knowledge Discovery from Data (TKDD)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an efficient method to construct reliable machine learning applications in peer-to-peer (P2P) networks by building ensemble based meta methods. We consider this problem in the context of distributed Web exploration applications like focused crawling. Typical applications are user-specific classification of retrieved Web contents into personalized topic hierarchies as well as automatic refinements of such taxonomies using unsupervised machine learning methods (e.g. clustering). Our approach is to combine models from multiple peers and to construct the advanced decision model that takes the generalization performance of multiple ‘local' peer models into account. In addition, meta algorithms can be applied in a restrictive manner, i.e. by leaving out some ‘uncertain' documents. The results of our systematic evaluation show the viability of the proposed approach.