A critical investigation of recall and precision as measures of retrieval system performance
ACM Transactions on Information Systems (TOIS)
Adaptive filter theory (3rd ed.)
Adaptive filter theory (3rd ed.)
The World-Wide Web: quagmire or gold mine?
Communications of the ACM
A scalable comparison-shopping agent for the World-Wide Web
AGENTS '97 Proceedings of the first international conference on Autonomous agents
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
ACM SIGMOD Record
Data Mining for Web Intelligence
Computer
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
Semi-Automatic Wrapper Generation for Internet Information Sources
COOPIS '97 Proceedings of the Second IFCIS International Conference on Cooperative Information Systems
Wrapper Generation for Web Accessible Data Sources
COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
Jedi: Extracting and Synthesizing Information from the Web
COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
A Fully Automated Object Extraction System for the World Wide Web
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Semi-automatic wrapper generation and adaption: living with heterogeneity in a market environment
Enterprise information systems IV
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A new clustering evaluation function using Renyi's information potential
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 06
Journal of Systems Architecture: the EUROMICRO Journal
Designing ETL processes using semantic web technologies
DOLAP '06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
Multimedia Tools and Applications
Integrating recommendation models for improved web page prediction accuracy
ACSC '08 Proceedings of the thirty-first Australasian conference on Computer science - Volume 74
Detecting data records in semi-structured web sites based on text token clustering
Integrated Computer-Aided Engineering
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Extracting the author of web pages
Proceedings of the 2nd ACM workshop on Information credibility on the web
Towards a System for Ontology-Based Information Extraction from PDF Documents
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
A method for web information extraction
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Information extraction in a set of knowledge using a fuzzy logic based intelligent agent
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
An integrated model for next page access prediction
International Journal of Knowledge and Web Intelligence
SXPath: extending XPath towards spatial querying on web documents
Proceedings of the VLDB Endowment
Towards a spatial instance learning method for deep web pages
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
SILA: a spatial instance learning approach for deep webpages
Proceedings of the 20th ACM international conference on Information and knowledge management
Expert Systems with Applications: An International Journal
Structure detection system from web documents through backpropagation network learning
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
An automatic web-oriented multimedia extraction and multiresolution visualization scheme
ACA'12 Proceedings of the 11th international conference on Applications of Electrical and Computer Engineering
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Hi-index | 0.00 |
A fully automated wrapper for information extraction from Web pages is presented. The motivation behind such systems lies in the emerging need for going beyond the concept of "human browsing.驴 The World Wide Web is today the main "all kind of information驴 repository and has been so far very successful in disseminating information to humans. By automating the process of information retrieval, further utilization by targeted applications is enabled. The key idea in our novel system is to exploit the format of the Web pages to discover the underlying structure in order to finally infer and extract pieces of information from the Web page. Our system first identifies the section of the Web page that contains the information to be extracted and then extracts it by using clustering techniques and other tools of statistical origin. STAVIES can operate without human intervention and does not require any training. The main innovation and contribution of the proposed system consists of introducing a signal-wise treatment of the tag structural hierarchy and using hierarchical clustering techniques to segment the Web pages. The importance of such a treatment is significant since it permits abstracting away from the raw tag-manipulating approach. Experimental results and comparisons with other state of the art systems are presented and discussed in the paper, indicating the high performance of the proposed algorithm.