Fast text searching: allowing errors
Communications of the ACM
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
A brief survey of web data extraction tools
ACM SIGMOD Record
Proceedings of the 27th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Visual Based Content Understanding towards Web Adaptation
AH '02 Proceedings of the Second International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
HTML Page Analysis Based on Visual Cues
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
A Fully Automated Object Extraction System for the World Wide Web
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Towards Automatic Incorporation of Search Engines into a Large-Scale Metasearch Engine
WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Automatic extraction of dynamic record sections from search engine result pages
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The portrait of a common HTML web page
Proceedings of the 2006 ACM symposium on Document engineering
Structured Data Extraction from the Web Based on Partial Tree Alignment
IEEE Transactions on Knowledge and Data Engineering
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Adaptive record extraction from web pages
Proceedings of the 16th international conference on World Wide Web
AllInOneNews: development and evaluation of a large-scale news metasearch engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MySearchView: a customized metasearch engine generator
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Extraction of flat and nested data records from web pages
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Mining templates from search result records of search engines
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint optimization of wrapper generation and template detection
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Webpage understanding: an integrated approach
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Context-aware wrapping: synchronized data extraction
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pictor: an interactive system for importing data from a website
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
The Journal of Machine Learning Research
WRAPPER INFERENCE FOR AMBIGUOUS WEB PAGES
Applied Artificial Intelligence
Experiences in crawling deep web in the context of local search
Proceedings of the 2nd international workshop on Geographic information retrieval
SESQ: A Model-Driven Method for Building Object Level Vertical Search Engines
ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Extracting data records from the web using tag path clustering
Proceedings of the 18th international conference on World wide web
ODE: Ontology-assisted data extraction
ACM Transactions on Database Systems (TODS)
A survey of Web clustering engines
ACM Computing Surveys (CSUR)
Can we learn a template-independent wrapper for news article extraction from a single training site?
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Table extraction using spatial reasoning on the CSS2 visual box model
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Automatic wrapper generation using tree matching and partial tree alignment
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Site-Wide Wrapper Induction for Life Science Deep Web Databases
DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
Crawling and Extracting Process Data from the Web
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Template-independent news extraction based on visual consistency
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Entropy-Based Visual Tree Evaluation on Block Extraction
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Efficient record-level wrapper induction
Proceedings of the 18th ACM conference on Information and knowledge management
Information extraction for search engines using fast heuristic techniques
Data & Knowledge Engineering
Visual extraction of information from web pages
Journal of Visual Languages and Computing
BIS'07 Proceedings of the 10th international conference on Business information systems
Using structured tokens to identify webpages for data extraction
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
WMS-extracting multiple sections data records from search engine results pages
Proceedings of the 2010 ACM Symposium on Applied Computing
Mining subtrees with frequent occurrence of similar subtrees
DS'07 Proceedings of the 10th international conference on Discovery science
Configurable meta-search in the job domain
International Journal of Web Engineering and Technology
Automatic extraction of web data records containing user-generated content
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A unified approach for extracting multiple news attributes from news pages
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Automatically extracting web data records
AMT'10 Proceedings of the 6th international conference on Active media technology
Foundations and Trends in Information Retrieval
Adaptable wrapper generation for web page format change
ACOS'06 Proceedings of the 5th WSEAS international conference on Applied computer science
Incremental structured web database crawling via history versions
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Towards a spatial instance learning method for deep web pages
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
An indent shape based approach for web lists mining
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Towards a unified solution: data record region detection and segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
Extract knowledge from semi-structured websites for search task simplification
Proceedings of the 20th ACM international conference on Information and knowledge management
SILA: a spatial instance learning approach for deep webpages
Proceedings of the 20th ACM international conference on Information and knowledge management
Extracting data records from query result pages based on visual features
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Hybrid method for automated news content extraction from the web
WISE'06 Proceedings of the 7th international conference on Web Information Systems
NET – a system for extracting web data from flat and nested data records
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Automated extraction of hit numbers from search result pages
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
RecipeCrawler: collecting recipe data from WWW incrementally
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
CCWrapper: adaptive predefined schema guided web extraction
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Configurable meta-search for integrating web public access catalogs
ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
AMBER: turning annotations into knowledge
Proceedings of the 21st international conference companion on World Wide Web
Data extraction for search engine using safe matching
AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Extracting multiple news attributes based on visual features
Journal of Intelligent Information Systems
Automatically extracting user reviews from forum sites
Computers & Mathematics with Applications
Clustering visually similar web page elements for structured web data extraction
ICWE'12 Proceedings of the 12th international conference on Web Engineering
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Multiple sections extraction using visual cue
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
Towards web-scale structured web data extraction
Proceedings of the sixth ACM international conference on Web search and data mining
Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System
International Journal of Data Warehousing and Mining
A general theory of spatial relations to support a graphical tool for visual information extraction
Journal of Visual Languages and Computing
SearchResultFinder: federated search made easy
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Visually extracting data records from the deep web
Proceedings of the 22nd international conference on World Wide Web companion
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
A learning classifier-based approach to aligning data items and labels
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Scalable and noise tolerant web knowledge extraction for search task simplification
Decision Support Systems
Leveraging spatial join for robust tuple extraction from web pages
Information Sciences: an International Journal
Hi-index | 0.00 |
When a query is submitted to a search engine, the search engine returns a dynamically generated result page containing the result records, each of which usually consists of a link to and/or snippet of a retrieved Web page. In addition, such a result page often also contains information irrelevant to the query, such as information related to the hosting site of the search engine and advertisements. In this paper, we present a technique for automatically producing wrappers that can be used to extract search result records from dynamically generated result pages returned by search engines. Automatic search result record extraction is very important for many applications that need to interact with search engines such as automatic construction and maintenance of metasearch engines and deep Web crawling. The novel aspect of the proposed technique is that it utilizes both the visual content features on the result page as displayed on a browser and the HTML tag structures of the HTML source file of the result page. Experimental results indicate that this technique can achieve very high extraction accuracy.