Scalable tag search in social network applications

  • Authors:
  • Alberto Mozo;Joaquín Salvachúa

  • Affiliations:
  • Department of Arquitectura y Tecnología de Computadores, Universidad Politécnica de Madrid (UPM), Madrid, Spain;Departament of Ingeniería de Sistemas Telemáticos, Universidad Politécnica de Madrid (UPM), Madrid, Spain

  • Venue:
  • Computer Communications
  • Year:
  • 2008

Quantified Score

Hi-index 0.24

Visualization

Abstract

Emerging social network applications for sharing and collaboration need a way to publish and search a big amount of social objects. Currently, social applications allow the association of a set of user defined keywords, named tags, when publishing these objects, in order to allow searching for them later using a subset of these tags. Commercial systems and recent research community proposals preclude a wide Internet deployment due to the emergence of scalability and hot spot problems in the nodes. We propose T-DHT, an innovative hybrid unstructured-structured DHT based approach, to cope with these high demanding requirements, in a fully scalable, distributed and balanced way. The storage process allows attaching a set of user tags to the stored object and takes at most O(Log(N)) node hops. The tag information attached to the object is stored in a compact way into the node links using a bloom filter, in order to be used later in the search process. The search process allows searching for previously stored objects by means of a tag conjunction and also takes at most O(Log(N)) node hops. The search process is based on DHT typical search combined with an unstructured search algorithm using the tag information previously stored into bloom filters of node links. The simulation results show T-DHT performs in a fully balanced and scalable way, without generating typical hot spot problems even if unbalanced distributions of popular tags are used. Although T-DHT has been devised to build a scalable infrastructure for social applications, it can be applied to solve the more general Peer-to-Peer keyword search problems.