Fast structural query with application to chinese treebank sentence retrieval

Authors:
Chia-Hsin Huang;Tyng-Ruey Chuang;Hahn-Ming Lee
Affiliations:
Academia Sinica Taipei, Taiwan;Academia Sinica Taipei, Taiwan;National Taiwan University of Science and Technology Taipei, Taiwan
Venue:
Proceedings of the 2004 ACM symposium on Document engineering
Year:
2004

Citing 13
Cited 3

Error-Tolerant Retrieval of Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
XRel: a path-based approach to storage and retrieval of XML documents using relational databases

ACM Transactions on Internet Technology (TOIT)
Accelerating XPath location steps

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A System for Approximate Tree Matching

IEEE Transactions on Knowledge and Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ATreeGrep: Approximate Searching in Unordered Trees

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
The XML benchmark project

The XML benchmark project
YFilter: Efficient and Scalable Filtering of XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient Filtering of XML Documents with XPath Expressions

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Light-weight xPath processing of XML stream with deterministic automata

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Comparing Lexicalized Treebank Grammars extracted from Chinese, Korean, and English corpora

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Efficient algorithms for processing XPath queries

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Prefiltering techniques for efficient XML document processing

Proceedings of the 2005 ACM symposium on Document engineering
XML Evolution: a two-phase XML processing model using XML prefiltering techniques

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Building GML-native web-based geographic information systems

Computers & Geosciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

In natural language processing a huge amount of structured data is constantly used for the extraction and presentation of grammatical structures in sentences. For example the Chinese Treebank corpus developed at the Institute of Information Science Academia Sinica Taiwan is a semantically annotated corpus that has been used to help parse and study Chinese sentences. In this setting users usually use structured tree patterns instead of keywords to query the corpus. In this paper we present an online prototype system that provides exploratory search ability. The system implements two flexible and efficient structural query methods and employs a user-friendly web-based interface. Although the system adopts the XML format to present the corpora and search results it does not use conventional XML query languages. As searching the Chinese Treebank corpora is structural in nature and often deals with structural similarities conventional XML query languages such as XPath and XQuery are inflexible and inefficient. We propose and implement a query algorithm called Parent-Child Relationship Filter (PCRF) which provides flexible and efficient structural search. PCRF is sufficiently flexible to provide several similarity-matching options such as wildcard unordered sibling sub-trees ancestor-descendant matching and their combinations. In addition PCRF supports stream-based matching to help users query their XML documents online. We also present three accelerating rules that achieve a 1.5- to 8-fold performance improvement in query time. Our experiment results show that our method archive a 10- to 1000-fold performance improvement compared to the usual text-based XPath query method.