Semantics of types for database objects
Theoretical Computer Science
Text algorithms
Tractable query languages for complex object databases
Journal of Computer and System Sciences
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Wrapper generation for semi-structured Internet sources
ACM SIGMOD Record
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Extracting schema from semistructured data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Olympic records for data at the 1998 Nagano games
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Information Systems - Special issue on semistructured data
Mathematical Methods for DNA Sequences
Mathematical Methods for DNA Sequences
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
The power of languages for the manipulation of complex values
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Queries over Web Views
EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Decomposition-Based optimization of reload strategies in the world wide web
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
RecipeCrawler: collecting recipe data from WWW incrementally
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Automatic data extraction from data-rich web pages
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
Encoding databases satisfying a given set of dependencies
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Hi-index | 0.00 |
We study the problem of rediscovering the schema of nested relations that have been encoded as strings for storage purposes. We consider various classes of encoding functions, and consider the markup encodings, which allow to find the schema without knowledge of the encoding function, under reasonable assumptions on the input data. Depending upon the encoding of empty sets, we propose two polynomial on-line algorithms (with different buffer size) solving the schema finding problem. We also prove that with a high probability, both algorithms find the schema after examining a fixed number of tuples, thus leading in practice to a linear time behavior with respect to the database size for wrapping the data. Finally, we show that the proposed techniques are well-suited for practical applications, such as structuring and wrapping HTML pages and Web sites.