Fast query for large treebanks

Authors:
Sumukh Ghodke;Steven Bird
Affiliations:
University of Melbourne, Victoria, Australia;University of Pennsylvania, Philadelphia, PA
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 25
Cited 2

Lore: a database management system for semistructured data

ACM SIGMOD Record
Multi-level annotation in the Emu speech database management system

Speech Communication - Special issue on speech annotation and corpus tools
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
eXist: An Open Source Native XML Database

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

Natural Language Engineering
Finite structure query: a tool for querying syntactically annotated corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
The LinGO Redwoods treebank motivation and preliminary applications

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Use of deep linguistic features for the recognition and labeling of semantic arguments

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Designing and Evaluating an XPath Dialect for Linguistic Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Discriminative syntactic language modeling for speech recognition

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Supporting annotation layers for natural language processing

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Efficient sentence retrieval based on syntactic structure

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

Computational Linguistics
XISS/R: XML indexing and storage system using RDBMS

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Wide-coverage efficient statistical parsing with ccg and log-linear models

Computational Linguistics
Molecular event extraction from link grammar parse trees

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Towards an alternative implementation of NXT's query language via XQuery

NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing
A search tool for parallel treebanks

LAW '07 Proceedings of the Linguistic Annotation Workshop
Querying Linguistic Trees

Journal of Logic, Language and Information
System for querying syntactically annotated corpora

ACLDemos '09 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

Parser evaluation over local and non-local deep dependencies in a large corpus

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The Emdros Text Database Engine as a Platform for Persuasive Computing

International Journal of Conceptual Structures and Smart Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

A variety of query systems have been developed for interrogating parsed corpora, or treebanks. With the arrival of efficient, wide-coverage parsers, it is feasible to create very large databases of trees. However, existing approaches that use in-memory search, or relational or XML database technologies, do not scale up. We describe a method for storage, indexing, and query of treebanks that uses an information retrieval engine. Several experiments with a large treebank demonstrate excellent scaling characteristics for a wide range of query types. This work facilitates the curation of much larger treebanks, and enables them to be used effectively in a variety of scientific and engineering tasks.