Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
AQR-toolkit: an adaptive query routing middleware for distributed data intensive systems
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
DNA-miner: a system prototype for mining DNA sequences
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Towards a visual query interface for phylogenetic databases
Proceedings of the tenth international conference on Information and knowledge management
A brief survey of web data extraction tools
ACM SIGMOD Record
Supporting Remote User Defined Functions in Heterogeneous Biological Databases
BIBE '01 Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering
Streamlining Biological Data Analysis Using BioFlow
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
OntoBuilder: Fully Automatic Extraction and Consolidation of Ontologies from Web Sources
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
A methodology for semantic integration of metadata in bioinformatics data sources
Proceedings of the 43rd annual Southeast regional conference - Volume 1
ODE: Ontology-assisted data extraction
ACM Transactions on Database Systems (TODS)
IRobotAssist: hosting automated agents for assistive web browsing
Telehealth/AT '08 Proceedings of the IASTED International Conference on Telehealth/Assistive Technologies
Hi-index | 0.00 |
Biological data analyses usually require complex manipulations involving tool applications, multiple web site navigation, result selection and filtering, and iteration over the internet. Most biological data are generated from structured databases and by applications and presented to the users embedded within repeated structures, or tables, in HTML documents. In this paper we outline a novel technique for the identification of table structures in HTML documents. This identification technique is then used to automatically generate composite wrappers for applications requiring distributed resources. We demonstrate that our method is robust enough to discover standard as well as non-standard table structures in HTML documents. Thus our technique outperforms contemporary techniques used in systems such as XWrap and AutoWrapper. We discuss our technique in the context of our PickUp system that exploits the theoretical developments presented in this paper and emerges as an elegant automatic wrapper generation system.