PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A Web-based information system that reasons with structured collections of text
AGENTS '98 Proceedings of the second international conference on Autonomous agents
Modeling Web sources for information integration
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Information extraction from HTML: application of a general machine learning approach
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Machine Learning
Semi-Automatic Wrapper Generation for Internet Information Sources
COOPIS '97 Proceedings of the Second IFCIS International Conference on Cooperative Information Systems
Wrapper induction for information extraction
Wrapper induction for information extraction
Learning to extract hierarchical information from semi-structured documents
Proceedings of the ninth international conference on Information and knowledge management
Mixed-initiative, multi-source information assistants
Proceedings of the 10th international conference on World Wide Web
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
A brief survey of web data extraction tools
ACM SIGMOD Record
A Conceptual Model and Rule-Based Query Language for HTML
World Wide Web
DEByE - Date extraction by example
Data & Knowledge Engineering
Information Extraction in Structured Documents Using Tree Automata Induction
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
A Framework for Generating Attribute Extractors for Web Data Sources
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Accurately and reliably extracting data from the Web: a machine learning approach
Intelligent exploration of the web
Story fountain: intelligent support for story research and exploration
Proceedings of the 9th international conference on Intelligent user interfaces
Learning rules for information extraction
Natural Language Engineering
Automatic generation of agents for collecting hidden web pages for data extraction
Data & Knowledge Engineering - Special issue: WIDM 2002
Towards building logical views of websites
Data & Knowledge Engineering - Special issue: WIDM 2002
Retrieving and Semantically Integrating Heterogeneous Data from the Web
IEEE Intelligent Systems
Automatic information extraction from large websites
Journal of the ACM (JACM)
Constraint-based wrapper specification and verification for cooperative information systems
Information Systems - Special issue: Data quality in cooperative information systems
Tree-Structured Template Generation for Web Pages
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Building Web Information Extraction Tasks
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Context Generalization for Information Extraction from the Web
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Fast Detection of XML Structural Similarity
IEEE Transactions on Knowledge and Data Engineering
Data & Knowledge Engineering
Automatic extraction of informative blocks from webpages
Proceedings of the 2005 ACM symposium on Applied computing
Journal of Intelligent Information Systems
HW-STALKER: a machine learning-based system for transforming QURE-Pagelets to XML
Data & Knowledge Engineering
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
Automatically utilizing secondary sources to align information across sources
AI Magazine - Special issue on semantic integration
Automatically identifying and georeferencing street maps on the web
Proceedings of the 2005 workshop on Geographic information retrieval
Adaptive information extraction
ACM Computing Surveys (CSUR)
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Information extraction from structured documents using k-testable tree automaton inference
Data & Knowledge Engineering
Interactive learning of node selecting tree transducer
Machine Learning
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features
ACM Transactions on Internet Technology (TOIT)
IEEE Transactions on Knowledge and Data Engineering
Combining Information Extraction Systems Using Voting and Stacked Generalization
The Journal of Machine Learning Research
Web wrapper induction: a brief survey
AI Communications
Exploiting structural similarity for effective Web information extraction
Data & Knowledge Engineering
SERGEANT: A framework for building more flexible web agents by exploiting a search engine
Web Intelligence and Agent Systems
FLUX-CIM: flexible unsupervised extraction of citation metadata
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Data Extraction From Repositories On The Web: A Semi-Automatic Approach
Journal of Integrated Design & Process Science
Corroborate and learn facts from the web
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting lists of data records from semi-structured web pages
Data & Knowledge Engineering
Automated data extraction from the web with conditional models
International Journal of Business Intelligence and Data Mining
Web task automation: a standards-based proposal
International Journal of Web Engineering and Technology
A wrapper generation system for PDF documents
Proceedings of the 2008 ACM symposium on Applied computing
A genetic algorithm for segmentation and information retrieval of SEC regulatory filings
dg.o '08 Proceedings of the 2008 international conference on Digital government research
Perception-oriented online news extraction
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
An unsupervised framework for extracting and normalizing product attributes from multiple web sites
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Negation recognition in medical narrative reports
Information Retrieval
Ontology-based information extraction and integration from heterogeneous data sources
International Journal of Human-Computer Studies
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
The Journal of Machine Learning Research
Automatic wrapper induction from hidden-web sources with domain knowledge
Proceedings of the 10th ACM workshop on Web information and data management
Data & Knowledge Engineering
Tuning up FOIL for extracting information from the web
International Journal of Computer Applications in Technology
Foundations and Trends in Databases
Extracting article text from the web with maximum subsequence segmentation
Proceedings of the 18th international conference on World wide web
Injecting software architectural constraints into legacy scientific applications
SECSE '09 Proceedings of the 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering
Sub Node Extraction with Tree Based Wrappers
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Segmentation of legal documents
Proceedings of the 12th International Conference on Artificial Intelligence and Law
Wrapper maintenance: a machine learning approach
Journal of Artificial Intelligence Research
Active learning with multiple views
Journal of Artificial Intelligence Research
Creating relational data from unstructured and ungrammatical data sources
Journal of Artificial Intelligence Research
Information extraction from web documents based on local unranked tree automaton inference
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Active learning with strong and weak views: a case study on wrapper induction
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Bayesian information extraction network
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Deploying information agents on the web
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Semantic annotation of unstructured and ungrammatical text
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
An information extraction approach to reorganizing and summarizing specifications
Information and Software Technology
An adaptive bottom up clustering approach for web news extraction
WOCC'09 Proceedings of the 18th international conference on Wireless and Optical Communications Conference
Finding and Extracting Data Records from Web Pages
Journal of Signal Processing Systems
Automation of the deep web with user defined behaviours
AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
Building wrapper agents for the deep web
ICWE'03 Proceedings of the 2003 international conference on Web engineering
Post-supervised template induction for dynamic web sources
AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
A conceptual model for the web
ER'00 Proceedings of the 19th international conference on Conceptual modeling
The GridLite DREAM: bringing the grid to your pocket
Proceedings of the 12th Monterey conference on Reliable systems on unreliable networked platforms
CETR: content extraction via tag ratios
Proceedings of the 19th international conference on World wide web
Finding and extracting data records from web pages
EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
Using clustering and edit distance techniques for automatic web data extraction
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Application of logic wrappers to hierarchical data extraction from HTML
EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Pattern-based semantic tagging for ontology population
SOCASE'08 Proceedings of the 2008 AAMAS international conference on Service-oriented computing: agents, semantics, and engineering
A method for web information extraction
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
MashUp web data sources and services based on semantic queries
Information Systems
Constructing reference sets from unstructured, ungrammatical text
Journal of Artificial Intelligence Research
Exploiting content redundancy for web information extraction
Proceedings of the VLDB Endowment
Link-based hidden attribute discovery for objects on Web
Proceedings of the 14th International Conference on Extending Database Technology
A research of the internet based on web information extraction and data fusion
ICWL'10 Proceedings of the 2010 international conference on New horizons in web-based learning
Web information extraction using markov logic networks
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards a spatial instance learning method for deep web pages
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
Automatic extraction rules generation based on XPath pattern learning
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
News information extraction based on adaptive weighting using unsupervised Bayesian algorithm
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Unsupervised user-generated content extraction by dependency relationships
WISE'11 Proceedings of the 12th international conference on Web information system engineering
Automatic web information extraction based on rules
WISE'11 Proceedings of the 12th international conference on Web information system engineering
Semi-supervised multi-task learning of structured prediction models for web information extraction
Proceedings of the 20th ACM international conference on Information and knowledge management
Towards a unified solution: data record region detection and segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
A simhash-based scheme for locating product information from the web
Proceedings of the Second Symposium on Information and Communication Technology
WetDL: a web information extraction language
ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Extracting and summarizing hot item features across different auction web sites
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Mechanisms of knowledge evolution for web information extraction
Proceedings of the 2005 international conference on Federation over the Web
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Learning (k,l)-contextual tree languages for information extraction
ECML'05 Proceedings of the 16th European conference on Machine Learning
Wrapping PDF documents exploiting uncertain knowledge
CAiSE'06 Proceedings of the 18th international conference on Advanced Information Systems Engineering
An overview and classification of adaptive approaches to information extraction
Journal on Data Semantics IV
Identifying content blocks from web documents
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Integrating data from the web by machine-learning tree-pattern queries
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Automatic data extraction from data-rich web pages
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
The HiLeX system for semantic information extraction
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Automatically learning gazetteers from the deep web
Proceedings of the 21st international conference companion on World Wide Web
Learning to adapt cross language information extraction wrapper
Applied Intelligence
WebSelF: a web scraping framework
ICWE'12 Proceedings of the 12th international conference on Web Engineering
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Unsupervised wrapper induction using linked data
Proceedings of the seventh international conference on Knowledge capture
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
Self-supervised automated wrapper generation for weblog data extraction
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Hi-index | 0.00 |
With the tremendous amount of information that becomes available on the Web on a daily basis, the ability to quickly develop information agents has become a crucial problem. A vital component of any Web-based information agent is a set of wrappers that can extract the relevant data from semistructured information sources. Our novel approach to wrapper induction is based on the idea of hierarchical information extraction, which turns the hard problem of extracting data from an arbitrarily complex document into a series of simpler extraction tasks. We introduce an inductive algorithm, STALKER, that generates high accuracy extraction rules based on user-labeled training examples. Labeling the training data represents the major bottleneck in using wrapper induction techniques, and our experimental results show that STALKER requires up to two orders of magnitude fewer examples than other algorithms. Furthermore, STALKER can wrap information sources that could not be wrapped by existing inductive techniques.