A technique for computer detection and correction of spelling errors
Communications of the ACM
GUI Ripping: Reverse Engineering of Graphical User Interfaces for Testing
WCRE '03 Proceedings of the 10th Working Conference on Reverse Engineering
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
DOM Scripting: Web Design with JavaScript and the Document Object Model
DOM Scripting: Web Design with JavaScript and the Document Object Model
CoScripter: automating & sharing how-to knowledge in the enterprise
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Crawling AJAX by Inferring User Interface State Changes
ICWE '08 Proceedings of the 2008 Eighth International Conference on Web Engineering
Learning to extract form labels
Proceedings of the VLDB Endowment
AJAX Crawl: Making AJAX Applications Searchable
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
An empirical study on using hidden markov model for search interface segmentation
Proceedings of the 18th ACM conference on Information and knowledge management
A hierarchical approach to model web query interfaces for web source integration
Proceedings of the VLDB Endowment
A unified ontology-based web page model for improving accessibility
Proceedings of the 19th international conference on World wide web
Understanding deep web search interfaces: a survey
ACM SIGMOD Record
A versatile model for web page representation, information extraction and content re-packaging
Proceedings of the 11th ACM symposium on Document engineering
OPAL: automated form understanding for the deep web
Proceedings of the 21st international conference on World Wide Web
A statistical approach for efficient crawling of rich internet applications
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Recording and replaying navigations on AJAX web sites
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Feature-based object identification for web automation
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
Web object identification plays an important role in research fields such as information extraction, web automation, and web form understanding for building meta-search engines. In contrast to other works, we approach this problem by analyzing various spatial, visual, functional and textual characteristics of web pages. We compute 49 unique features for all visible web page elements, which are then applied to machine learning classifiers in order to identify similar elements on other previously unexamined web pages. We evaluate our approach with different scenarios by analyzing the relevance of the chosen features and the classification rate of the applied classifiers. These scenarios focus on understanding search forms from the transportation domain, particularly flight, train, and bus connections. The results of the evaluation are very promising.