Using weakly structured documents at the user-interface level to fill in a classical database

Authors:
Frederique Laforest;André Flory
Affiliations:
National Institute of Applied Sciences, France;National Institute of Applied Sciences, France
Venue:
Advanced topics in database research vol. 1
Year:
2003

Citing 23
Cited 0

Another look at automatic text-retrieval systems

Communications of the ACM
Retrieval from hierarchical texts by partial patterns

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
From structured documents to novel query facilities

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Lightweight databases

Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
A query language and optimization techniques for unstructured data

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Textual context analysis for information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Passage retrieval revisited

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Lore: a database management system for semistructured data

ACM SIGMOD Record
Information gathering in the World-Wide Web: the W3QL query language and the W3QS system

ACM Transactions on Database Systems (TODS)
A Web-based information system that reasons with structured collections of text

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Ontology-based extraction and structuring of information from data-rich unstructured documents

Proceedings of the seventh international conference on Information and knowledge management
Information extraction from case law and retrieval of prior cases by partial parsing and query generation

Proceedings of the seventh international conference on Information and knowledge management
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A query language for XML

WWW '99 Proceedings of the eighth international conference on World Wide Web
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
Expressive and efficient pattern languages for tree-structured data (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Document processing in a relational database system

ACM Transactions on Information Systems (TOIS)
Integrating contents and structure in text retrieval

ACM SIGMOD Record
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Conceptual-Modeling Approach to Extracting Data from the Web

ER '98 Proceedings of the 17th International Conference on Conceptual Modeling
Structured document storage and refined declarative and navigational access mechanisms in HyperStorM

The VLDB Journal — The International Journal on Very Large Data Bases
A Declarative Language for Querying and Restructuring the Web

RIDE '96 Proceedings of the 6th International Workshop on Research Issues in Data Engineering (RIDE '96) Interoperability of Nontraditional Database Systems
Transformation Rules from Semi-structured XML Documents to Database Model

AICCSA '01 Proceedings of the ACS/IEEE International Conference on Computer Systems and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Electronic documents have become a universal way of communication due to Web expansion. But using structured information stored in databases is still essential for data coherence management, querying facilities, etc. We thus face a classical problem-known as "impedance mismatch" in the database world; two antagonist approaches have to collaborate. Using documents at the end-user interface level provides simplicity and flexibility. But it is possible to take documents as data sources only if helped by a human being; automatic document analysis systems have a significant error rate. Databases are an alternative as semantics and format of information are strict; queries via SQL provide 100% correct responses. The aim of this work is to provide a system that associates document capture freedom with database storage structure.The system we propose does not intend to be universal. It can be used in specific cases where people usually work with technical documents dedicated to a particular domain. Our examples concern medicine and more explicitly medical records. Computerization has very often been rejected by physicians because it necessitates too much standardization and form-based user interfaces are not easily adapted to their daily practice. In this domain, we think that this study provides a viable alternative approach. This system offers freedom to doctors; they would fill in documents with the information they want to store, in a convenient order and in a freer way. We have developed a system that allows a database to fill in quasi-automatically from documents paragraphs.The database used is an already existing database that can be queried in a classical way for statistical studies or epidemiological purposes. In this system, the document fund and the database containing extractions from dccuments coexist. Queries are sent to the database, answers include data from the database and references to source documents.