3SEPIAS: A Semi-Structured Search Engine for Personal Information in dAtaspace System

  • Authors:
  • Ming Zhong;Mengchi Liu;Yanxiang He

  • Affiliations:
  • State Key Laboratory of Software Engineering, Wuhan University, Luojiashan, Wuhan 430072, China;School of Computer Science, Carleton University, 1125 Colonel By Drive, Ottawa, Canada K1S 5B6;State Key Laboratory of Software Engineering, Wuhan University, Luojiashan, Wuhan 430072, China

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

Nowadays, personal information is being distributed into more and more heterogeneous sources, which presents a huge obstacle to management and retrieval of personal information. To address this problem, this paper presents the blueprint of a novel Personal Information Management (PIM) system named 3SEPIAS (short for Semi-Structured Search Engine for Personal Information in dAtaspace System). 3SEPIAS has three main features, data integration without upfront semantic reconciliation, flexible query model for data having sparse and evolving schema, and efficient best-effort proximity search approach on graphs. For that, we first propose a semi-structured graph data model called Interpreted Object Model (IOM) to uniformly represents a user's heterogeneous personal information and loosely integrates it into a dataspace in a schema-later way. Then, a Semi-Structured Search Engine (3SE) can be used to search over the personal dataspaces. We propose an intuitive 3SE Query Language (3SQL) that enables users to query in a varying degree of structural constraint according to their knowledge of underlying schemas. Moreover, a best-effort top-k proximity search optimization strategy and corresponding graph index structures are proposed to improve the efficiency of query processing. We perform comprehensive experiments to test both effectiveness and efficiency of our proximity search approach. The results reveal that 3SE can beat the previous proximity search systems by a large margin with only a little or even no loss of result quality, especially for large graphs.