Peer-to-peer distributed text classifier learning in PADMINI

  • Authors:
  • Xianshu Zhu;Tushar Mahule;Haimonti Dutta;Sugandha Arora;Hillol Kargupta;Kirk Borne

  • Affiliations:
  • Department of CSEE, University of Maryland, Baltimore County, Baltimore, MD, USA;Department of CSEE, University of Maryland, Baltimore County, Baltimore, MD, USA;The Center for Computational Learning Systems, Columbia University, New York, NY, USA;Department of CSEE, University of Maryland, Baltimore County, Baltimore, MD, USA;Department of CSEE, University of Maryland, Baltimore County, Baltimore, MD, USA;Department of Computational and Data Sciences, George Mason University, Fairfax, VA, USA

  • Venue:
  • Statistical Analysis and Data Mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Popular Internet document repositories, such as online newspapers, digital libraries, and blogs store large amount of text and image data that are frequently accessed by large number of users. Users' input through collaborative commenting or tagging can be very useful in organizing and classifying documents. Some web sites (e.g. Google Image Labeler) support a collection of tags and labels, but a large fraction of these sites do not currently support such activities. Moreover, relying upon centrally controlled web-service providers for such support is probably not a good idea if the objective is to make the collaborative inputs publicly available. Often, business entities offering such web-based tagging environments end up owning and monetizing the result of the collective effort. This paper takes a step toward addressing this problem—it proposes a peer-to-peer (P2P) system (PADMINI), powered by distributed data mining algorithms. In particular, it focuses on learning a P2P classifier from tagged text data. This paper describes the PADMINI system and the distributed text classifier learning components; text classification is posed as a linear program and an asynchronous distributed algorithm is used to solve it. It also presents extensive empirical results on text data obtained from the Hubble Space Telescope (HST) proposal abstract database. Copyright © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2012, © 2012 Wiley Periodicals, Inc. (The author is also affiliated to Agnik LLC., Columbia, MD, USA)