Towards a theory of natural language interfaces to databases

  • Authors:
  • Ana-Maria Popescu;Oren Etzioni;Henry Kautz

  • Affiliations:
  • University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA

  • Venue:
  • Proceedings of the 8th international conference on Intelligent user interfaces
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The need for Natural Language Interfaces (NLIs) to databases has become increasingly acute as more nontechnical people access information through their web browsers, PDAs and cell phones. Yet NLIs are only usable if they map natural language questions to SQL queries correctly. We introduce the Precise NLI [2], which reduces the semantic interpretation challenge in NLIs to a graph matching problem. Precise uses the max-flow algorithm to efficiently solve this problem. Each max-flow solution corresponds to a possible semantic interpretation of the sentence. precise collects max-flow solutions, discards the solutions that do not obey syntactic constraints and retains the rest as the basis for generating SQL queries corresponding to the question q. The syntactic information is extracted from the parse tree corresponding to the given question which is computed by a statistical parser [1]. For a broad, well-defined class of semantically tractable natural language questions, Precise is guaranteed to map each question to the corresponding SQL querySemantically tractable questions correspond to a natural, domain-independent subset of English that can be efficiently and accurately interpreted as nonrecursive Datalog clauses. Precise is transportable to arbitrary databases, such as the Restaurants,Jobs and Geography databases used in our implementation. Examples of semantically tractable questions include: "What Chinese restaurants with a 3.5 rating are in Seattle?", "What are the areas of US states with large populations?", "What jobs require 4 years of experience and desire a B.S.CS degree?".Given a question which is not semantically tractable, Precise recognizes it as such and informs the user that it cannot answer it.Given a semantically tractable question, Precise computes the set of non-equivalent SQL interpretations corresponding to the question. If a unique such SQL interpretation exists, Precise outputs it together with the corresponding result set obtained by querying the current database. If the set contains more than one SQL interpretation, the natural language question is ambiguous in the context of the current database. In this case, Precise asks for the user's help in determining which interpretation is the correct one.Our experiments have shown that Precise has high coverage and accuracy over common English questions. In future work, we plan to explore increasingly broad classes of questions and include Precise as a module in a full-fledged dialog system. An important direction for future work is helping users understand the types of questions Precise cannot handle via dialog, enabling them to build an accurate mental model of the system and its capabilities. Also, our own group's work on the EXACT natural language interface [3] builds on Precise and on the underlying theoretical framework. EXACT composes an extended version of Precise with a sound and complete planner to develop a powerful and provably reliable interface to household appliances