A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
Efficient Web form entry on PDAs
Proceedings of the 10th international conference on World Wide Web
Efficient Web form entry on PDAs
Proceedings of the 10th international conference on World Wide Web
Proceedings of the 27th International Conference on Very Large Data Bases
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Automatic integration of Web search interfaces with WISE-Integrator
The VLDB Journal — The International Journal on Very Large Data Bases
Layered representations for learning and inferring office activity from multiple sensory channels
Computer Vision and Image Understanding - Special issue on event detection in video
Automating Content Extraction of HTML Documents
World Wide Web
Queue - Semi-structured Data
A Robust Approach to Schema Matching overWeb Query Interfaces
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Communications of the ACM - ACM at sixty: a look back in time
A Generalized Hidden Markov Model Approach for Web Information Extraction
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Instance-based schema matching for web databases by domain-specific query probing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Learning to extract form labels
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Understanding deep web search interfaces: a survey
ACM SIGMOD Record
A study on using two-phase conditional random fields for query interface segmentation
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Web Query Interface Parsing for Building Web-Based Metasearch Systems
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Automatically mapping and integrating multiple data entry forms into a database
ER'11 Proceedings of the 30th international conference on Conceptual modeling
OPAL: automated form understanding for the deep web
Proceedings of the 21st international conference on World Wide Web
OPAL: a passe-partout for web forms
Proceedings of the 21st international conference companion on World Wide Web
Web-based closed-domain data extraction on online advertisements
Information Systems
Learning to discover complex mappings from web forms to ontologies
Proceedings of the 21st ACM international conference on Information and knowledge management
Understanding query interfaces by statistical parsing
ACM Transactions on the Web (TWEB)
Web object identification for web automation and meta-search
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
The ontological key: automatically understanding and integrating forms to access the deep Web
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
This paper describes a hidden Markov model (HMM) based approach to perform search interface segmentation. Automatic processing of an interface is a must to access the invisible contents of deep Web. This entails automatic segmentation, i.e., the task of grouping related components of an interface together. While it is easy for a human to discern the logical relationships among interface components, machine processing of an interface is difficult. In this paper, we propose an approach to segmentation that leverages the probabilistic nature of the interface design process. The design process involves choosing components based on the underlying database query requirements, and organizing them into suitable patterns. We simulate this process by creating an "artificial designer" in the form of a 2-layered HMM. The learned HMM acquires the implicit design knowledge required for segmentation. We empirically study the effectiveness of the approach across several representative domains of deep Web. In terms of segmentation accuracy, the HMM-based approach outperforms an existing state-of-the-art approach by at least 10% in most cases. Furthermore, our cross-domain investigation shows that a single HMM trained on data having varied and frequent design patterns can accurately segment interfaces from multiple domains.