Algorithms for string searching
ACM SIGIR Forum
A maximum entropy approach to natural language processing
Computational Linguistics
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Annals of Mathematics and Artificial Intelligence
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A Fully Automated Object Extraction System for the World Wide Web
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
A maximum entropy approach to named entity recognition
A maximum entropy approach to named entity recognition
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Using the structure of Web sites for automatic segmentation of tables
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Structured databases on the web: observations and implications
ACM SIGMOD Record
Schema Matching Using Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
Towards Ontology Generation from Tables
World Wide Web
Automatic complex schema matching across Web query interfaces: A correlation mining approach
ACM Transactions on Database Systems (TODS)
Structured Data Extraction from the Web Based on Partial Tree Alignment
IEEE Transactions on Knowledge and Data Engineering
Instance-based schema matching for web databases by domain-specific query probing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Automatic hidden-web table interpretation, conceptualization, and semantic annotation
Data & Knowledge Engineering
Automatic hidden-web table interpretation by sibling page comparison
ER'07 Proceedings of the 26th international conference on Conceptual modeling
Holistic schema matching for web query interfaces
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Bootstrapping domain ontology for semantic web services from source web sites
TES'05 Proceedings of the 6th international conference on Technologies for E-Services
Information extraction for search engines using fast heuristic techniques
Data & Knowledge Engineering
ObjectRunner: lightweight, targeted extraction and querying of structured web data
Proceedings of the VLDB Endowment
Real understanding of real estate forms
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
How the minotaur turned into ariadne: ontologies in web data extraction
ICWE'11 Proceedings of the 11th international conference on Web engineering
Little knowledge rules the web: domain-centric result page extraction
RR'11 Proceedings of the 5th international conference on Web reasoning and rule systems
Towards a unified solution: data record region detection and segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
Theoretical foundations for enabling a web of knowledge
FoIKS'10 Proceedings of the 6th international conference on Foundations of Information and Knowledge Systems
AMBER: turning annotations into knowledge
Proceedings of the 21st international conference companion on World Wide Web
Automatically learning gazetteers from the deep web
Proceedings of the 21st international conference companion on World Wide Web
Data extraction for search engine using safe matching
AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Multiple sections extraction using visual cue
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
Towards discovering ontological models from big RDF data
ER'12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling
Towards web-scale structured web data extraction
Proceedings of the sixth ACM international conference on Web search and data mining
Understanding query interfaces by statistical parsing
ACM Transactions on the Web (TWEB)
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
Strigil: A Framework for Data Extraction in Semi-Structured Web Documents
Proceedings of International Conference on Information Integration and Web-based Applications & Services
The ontological key: automatically understanding and integrating forms to access the deep Web
The VLDB Journal — The International Journal on Very Large Data Bases
Framework for surveillance of instant messages
International Journal of Internet Technology and Secured Transactions
Hi-index | 0.00 |
Online databases respond to a user query with result records encoded in HTML files. Data extraction, which is important for many applications, extracts the records from the HTML files automatically. We present a novel data extraction method, ODE (Ontology-assisted Data Extraction), which automatically extracts the query result records from the HTML pages. ODE first constructs an ontology for a domain according to information matching between the query interfaces and query result pages from different Web sites within the same domain. Then, the constructed domain ontology is used during data extraction to identify the query result section in a query result page and to align and label the data values in the extracted records. The ontology-assisted data extraction method is fully automatic and overcomes many of the deficiencies of current automatic data extraction methods. Experimental results show that ODE is extremely accurate for identifying the query result section in an HTML page, segmenting the query result section into query result records, and aligning and labeling the data values in the query result records.