Management of semistructured data

Authors:
Dan Suciu
Affiliations:
AT&T Labs - - Research
Venue:
ACM SIGMOD Record
Year:
1997

Citing 0
Cited 9

Report on the 5th international workshop on knowledge representation meets databases (KRDB'98)

ACM SIGMOD Record
ESSQL: an enhanced semi-structured query language for composite document retrievals

Proceedings of the 16th annual international conference on Computer documentation
Rapper: a wrapper generator with linguistic knowledge

Proceedings of the 2nd international workshop on Web information and data management
Data integration using similarity joins and a word-based information representation language

ACM Transactions on Information Systems (TOIS)
Access to heterogeneous data sources for supporting business process execution

Proceedings of the 2001 ACM symposium on Applied computing
Advanced XML data processing: guest editor's introduction

ACM SIGMOD Record
Analysis of Document Structures for Element Type Classification

PODDP '98 Proceedings of the 4th International Workshop on Principles of Digital Document Processing
Weakly Constraining Multimedia Types Based on a Type Embedding Ordering

MIS '98 Proceedings of the 4th International Workshop on Advances in Multimedia Information Systems
Structured Web Pages Management for Efficient Data Retrieval

WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 2 - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

A huge amount of data is available today on the Internet, or on the private Intranets of many companies. This data is structured in a multitude of ways. At an extreme we find data coming from traditional relational or object-oriented databases, with a completely known structure. At another extreme we have data which is fully unstructured, such as images, sounds, and raw text. But most of the data falls somewhere in between these two extremes, for a variety of reasons: the data may be structured, but the structure is not know to the user; the user may know the structure, but chooses to ignore it, for browsing purposes; the structure may be implicit, such as in formatted text, and is not as rigid and regular as in traditional databases; the data may be in non-traditional formats, such as the ASN.1 exchange format; the schema of the data is huge and changes often, so that we may prefer to ignore it. Several researchers have worked recently on problems related to data fitting this description, and have coined the term semistructured data for it. Two recent tutorials [Abi97, Bun97] contain an excellent introduction to semistructured data and a comprehensive bibliography on this new research topic.