3SEPIAS: A Semi-Structured Search Engine for Personal Information in dAtaspace System

Authors:
Ming Zhong;Mengchi Liu;Yanxiang He
Affiliations:
State Key Laboratory of Software Engineering, Wuhan University, Luojiashan, Wuhan 430072, China;School of Computer Science, Carleton University, 1125 Colonel By Drive, Ottawa, Canada K1S 5B6;State Key Laboratory of Software Engineering, Wuhan University, Luojiashan, Wuhan 430072, China
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 43
Cited 0

Semantic file systems

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Time-machine computing: a time-centric approach for the information environment

Proceedings of the 12th annual ACM symposium on User interface software and technology
Extending document management systems with user-specific active properties

ACM Transactions on Information Systems (TOIS)
Retrieving and organizing web pages by “information unit”

Proceedings of the 10th international conference on World Wide Web
Lifestreams: a storage model for personal data

ACM SIGMOD Record
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
MyLifeBits: fulfilling the Memex vision

Proceedings of the tenth ACM international conference on Multimedia
SISQL: Schema-Independent Database Querying (On and Off the Web)

IDEAS '00 Proceedings of the 2000 International Symposium on Database Engineering & Applications
Stuff I've seen: a system for personal information retrieval and re-use

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Querying structured text in an XML database

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
DBXplorer: A System for Keyword-Based Search over Relational Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Texquery: a full-text search extension to xquery

Proceedings of the 13th international conference on World Wide Web
On the integration of structure indexes and inverted lists

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The SphereSearch engine for unified ranked retrieval of heterogeneous XML and web documents

VLDB '05 Proceedings of the 31st international conference on Very large data bases
iMeMex: escapes from the personal information jungle

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Structured queries in XML retrieval

Proceedings of the 14th ACM international conference on Information and knowledge management
From databases to dataspaces: a new abstraction for information management

ACM SIGMOD Record
Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Principles of dataspace systems

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
iDM: a unified and versatile data model for personal dataspace management

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Indexing dataspaces

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
BLINKS: ranked keyword searches on graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
iTrails: pay-as-you-go information integration in dataspaces

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A relational approach to incrementally extracting and querying structure in unstructured data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pay-as-you-go user feedback for dataspace systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Bootstrapping pay-as-you-go data integration systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents

Information Sciences: an International Journal
Efficient keyword proximity search using a frontier-reduce strategy based on d-distance graph index

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
3se: a semi-structured search engine for heterogeneous data in graph model

Proceedings of the 18th ACM conference on Information and knowledge management
A Flexible Data Warehousing Approach for One-Stop Querying on Heterogeneous Personal Information

DEXA '09 Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application
Semantic relevance ranking for XML keyword search

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

Nowadays, personal information is being distributed into more and more heterogeneous sources, which presents a huge obstacle to management and retrieval of personal information. To address this problem, this paper presents the blueprint of a novel Personal Information Management (PIM) system named 3SEPIAS (short for Semi-Structured Search Engine for Personal Information in dAtaspace System). 3SEPIAS has three main features, data integration without upfront semantic reconciliation, flexible query model for data having sparse and evolving schema, and efficient best-effort proximity search approach on graphs. For that, we first propose a semi-structured graph data model called Interpreted Object Model (IOM) to uniformly represents a user's heterogeneous personal information and loosely integrates it into a dataspace in a schema-later way. Then, a Semi-Structured Search Engine (3SE) can be used to search over the personal dataspaces. We propose an intuitive 3SE Query Language (3SQL) that enables users to query in a varying degree of structural constraint according to their knowledge of underlying schemas. Moreover, a best-effort top-k proximity search optimization strategy and corresponding graph index structures are proposed to improve the efficiency of query processing. We perform comprehensive experiments to test both effectiveness and efficiency of our proximity search approach. The results reveal that 3SE can beat the previous proximity search systems by a large margin with only a little or even no loss of result quality, especially for large graphs.