A two-phase rule generation and optimization approach for wrapper generation
ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Logical structure analysis: From HTML to XML
Computer Standards & Interfaces
Normalizing web product attributes and discovering domain ontology with minimal effort
Proceedings of the fourth ACM international conference on Web search and data mining
Automatic data extraction from data-rich web pages
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Learning to adapt cross language information extraction wrapper
Applied Intelligence
Aggregated search: A new information retrieval paradigm
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Extracting data from Web pages using wrappers is afundamental problem arising in a large variety ofapplications of vast practical interest. In this paper, wepropose a novel schema-guided approach to wrappergeneration. We provide a user-friendly interface thatallows users to define the schema of the data to beextracted, and specifies mappings from a HTML page tothe target schema. Based on the mappings, the systemcan automatically generate an extraction rule to extractdata from the page. Our approach to wrapper generationcan significantly reduce the work of human beings inthis process. And the user never have to deal with theinternal extraction rule, or even familiarity with thedetails of HTML.