Introduction to algorithms
String searching algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
DEByE - Date extraction by example
Data & Knowledge Engineering
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Alignment of Trees - An Alternative to Tree Edit
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Rapid identification of repeated patterns in strings, trees and arrays
STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
WICCAP: From Semi-structured Data to Structured Data
ECBS '04 Proceedings of the 11th IEEE International Conference and Workshop on Engineering of Computer-Based Systems
Clustal W and Clustal X version 2.0
Bioinformatics
DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
On-the-Fly Integration and Ad Hoc Querying of Life Sciences Databases Using LifeDB
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Ontology guided autonomous label assignment in wrapper induced tables with missing column names
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Wikipedia driven autonomous label assignment in wrapper induced tables with missing column names
Proceedings of the 2010 ACM Symposium on Applied Computing
Hi-index | 0.00 |
In the last few years, several works in the literature have addressed the problem of data extraction from web pages. The importance of this problem derives from the fact that, once extracted, data can be handled in a way similar to instances of a traditional database, which in turn can facilitate application of web data integration and various other domain specific problems. In this paper, we propose a novel table extraction technique that works on web pages generated dynamically from a back-end database. The proposed system can automatically discover table structure by relevant pattern mining from web pages in an efficient way, and can generate regular expression for the extraction process. This approach requires no human intervention and experimental results have shown its accuracy to be promising. Moreover, the algorithm works in linear time to generate the wrapper.