Introduction to probability and statistics (7th ed.)
Introduction to probability and statistics (7th ed.)
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Machine Learning
Proceedings of the 27th International Conference on Very Large Data Bases
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Automatic extraction of web search interfaces for interface schema integration
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Merging Interface Schemas on the Deep Web via Clustering Aggregation
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
WebIQ: Learning from the Web to Match Deep-Web Query Interfaces
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Query Selection Techniques for Efficient Crawling of Structured Web Sources
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Combining classifiers to identify online databases
Proceedings of the 16th international conference on World Wide Web
An adaptive crawler for locating hidden-Web entry points
Proceedings of the 16th international conference on World Wide Web
Research on Communication Constellation Simulation Based on Exploratory Analysis
IIH-MSP '08 Proceedings of the 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing
An empirical study on using hidden markov model for search interface segmentation
Proceedings of the 18th ACM conference on Information and knowledge management
Hearsay: a new generation context-driven multi-modal assistive web browser
Proceedings of the 19th international conference on World wide web
Creating and exploring web form repositories
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Understanding deep web search interfaces: a survey
ACM SIGMOD Record
Mixture model based label association techniques for web accessibility
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
PruSM: a prudent schema matching approach for web forms
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
FAETON: Form Analysis and Extraction Tool for ONtology construction
International Journal of Computer Applications in Technology
Carbon: domain-independent automatic web form filling
ICWE'10 Proceedings of the 10th international conference on Web engineering
Morpheus: a deep web question answering system
Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
Real understanding of real estate forms
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Layout object model for extracting the schema of web query interfaces
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
How the minotaur turned into ariadne: ontologies in web data extraction
ICWE'11 Proceedings of the 11th international conference on Web engineering
A study on using two-phase conditional random fields for query interface segmentation
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Web Query Interface Parsing for Building Web-Based Metasearch Systems
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
OPAL: automated form understanding for the deep web
Proceedings of the 21st international conference on World Wide Web
OPAL: a passe-partout for web forms
Proceedings of the 21st international conference companion on World Wide Web
Extracting widget descriptions from GUIs
FASE'12 Proceedings of the 15th international conference on Fundamental Approaches to Software Engineering
Learning to discover complex mappings from web forms to ontologies
Proceedings of the 21st ACM international conference on Information and knowledge management
Understanding query interfaces by statistical parsing
ACM Transactions on the Web (TWEB)
Web object identification for web automation and meta-search
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
The ontological key: automatically understanding and integrating forms to access the deep Web
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
In this paper we describe a new approach to extract element labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate information that is hidden behind form interfaces, such as hidden Web crawlers and metasearchers. However, given the wide variation in form layout, even within a well-defined domain, automatically extracting these labels is a challenging problem. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of a learning classifier ensemble to identify element-label mappings; and it applies a reconciliation step which leverages the classifier-derived mappings to boost extraction accuracy. We present a detailed experimental evaluation using over three thousand Web forms. Our results show that our approach is effective: it obtains significantly higher accuracy and is more robust to variability in form layout than previous label extraction techniques.