Ontology-based extraction and structuring of information from data-rich unstructured documents
Proceedings of the seventh international conference on Information and knowledge management
ESSQL: an enhanced semi-structured query language for composite document retrievals
Proceedings of the 16th annual international conference on Computer documentation
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Navigational plans for data integration
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Rapper: a wrapper generator with linguistic knowledge
Proceedings of the 2nd international workshop on Web information and data management
Automatic migration of files into relational databases
Proceedings of the 2nd international workshop on Web information and data management
Automatically extracting structure and data from business reports
Proceedings of the eighth international conference on Information and knowledge management
Extracting semi-structured data through examples
Proceedings of the eighth international conference on Information and knowledge management
Computational aspects of resilient data extraction from semistructured sources (extended abstract)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
FACT: a learning based Web query processing system
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Re-engineering structures from Web documents
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Learning to extract hierarchical information from semi-structured documents
Proceedings of the ninth international conference on Information and knowledge management
WebViews: accessing personalized web content and services
Proceedings of the 10th international conference on World Wide Web
Querying websites using compact skeletons
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hermes: a notification service for digital libraries
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Discovering unexpected information from your competitors' web sites
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Wrapping-oriented classification of web pages
Proceedings of the 2002 ACM symposium on Applied computing
Query processing with quality control in the World Wide Web
World Wide Web
DIASPORA: A highly distributed web-query processing system
World Wide Web
DEByE - Date extraction by example
Data & Knowledge Engineering
Managing Web-Based Data: Database Models and Transformations
IEEE Internet Computing
Wrapper Generation via Grammar Induction
ECML '00 Proceedings of the 11th European Conference on Machine Learning
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Cognitive Multi-agent Systems for Integrated Information Retrieval and Extraction over the Web
IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
Automatic Wrapper Generation for Web Search Engines
WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
Toward Learning Based Web Query Processing
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The Design and Implementation of Modularized Wrappers/ Monitors in a Data Warehouse
DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Schema Evolution in Heterogeneous Database Architectures, A Schema Transformation Approach
CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
When Two Case Bases Are Better than One: Exploiting Multiple Case Bases
ICCBR '01 Proceedings of the 4th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
A Case-Based Recognition of Semantic Structures in HTML Documents
IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Natural Language Guided Dialogues for Accessing the Web
TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Selected Papers from the Symposium on Conceptual Modeling, Current Issues and Future Directions
A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web
ER '99 Proceedings of the Workshops on Evolution and Change in Data Management, Reverse Engineering in Information Systems, and the World Wide Web and Conceptual Modeling
Mining product reputations on the Web
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining topic-specific concepts and definitions on the web
WWW '03 Proceedings of the 12th international conference on World Wide Web
Integrity issues in the Web: beyond distributed databases
Database integrity
Querying websites using compact skeletons
Journal of Computer and System Sciences - Special issu on PODS 2001
Database management issues in the web environment
Effective databases for text & document management
On Precision and Recall of Multi-Attribute Data Extraction from Semistructured Sources
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Schema-guided wrapper maintenance for web-data extraction
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
On the complexity of schema inference from web pages in the presence of nullable data attributes
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Nstar: an interactive tool for local web search
Information and Management
A uniform framework for integration of information from the web
Information Systems - Special issue on web data integration
Towards building logical views of websites
Data & Knowledge Engineering - Special issue: WIDM 2002
Automatic information extraction from large websites
Journal of the ACM (JACM)
Bootstrapping Semantic Annotation for Content-Rich HTML Documents
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Extraction of meaningful tables from the internet using decision trees
IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
Speculative plan execution for information gathering
Artificial Intelligence
An ontological multi-agent system for web services
ICCOMP'07 Proceedings of the 11th WSEAS International Conference on Computers
A wrapper generation system for PDF documents
Proceedings of the 2008 ACM symposium on Applied computing
An ontological website models-supported search agent for web services
Expert Systems with Applications: An International Journal
Automated Semantic Analysis of Schematic Data
World Wide Web
The Harmony Integration Workbench
Journal on Data Semantics XI
Combining artificial intelligence and databases for data integration
Artificial intelligence today
No Code Required: Giving Users Tools to Transform the Web
No Code Required: Giving Users Tools to Transform the Web
A Bidirectional Heuristic Search for web service composition with costs
International Journal of Web and Grid Services
Mobile information exchange and integration: from query to application layer
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Expert Systems with Applications: An International Journal
An ontology-supported information management agent with solution integration and proxy
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Mining table information on the internet
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Web scale competitor discovery using mutual information
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
A bidirectional heuristic search technique for web service composition
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part IV
Wrapping PDF documents exploiting uncertain knowledge
CAiSE'06 Proceedings of the 18th international conference on Advanced Information Systems Engineering
Automatic image description based on textual data
Journal on Data Semantics VII
Decision making aid in mobile environment by behavioral characteristic
Proceedings of the 13th International Conference on Electronic Commerce
Extraction and integration of web data by end-users
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
With the current explosion of information on the World Wide Web (WWW) a wealth of information on many different subjects has become available on-line. Numerous sources contain information that can be classified as semi-structured. At present, however, the only way to access the information is by browsing individual pages. We cannot query web documents in a database-like fashion based on their underlying structure. However, we can provide database-like querying for semi-structured WWW sources by building wrappers around these sources. We present an approach for semi-automatically generating such wrappers. The key idea is to exploit the formatting information in pages from the source to hypothesize the underlying structure of a page. From this structure the system generates a wrapper that facilitates querying of a source and possibly integrating it with other sources. We demonstrate the ease with which we are able to build wrappers for a number of internet sources in different domains using our implemented wrapper generation toolkit.