TileBars: visualization of term distribution information in full text information access
CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Designing Web Usability: The Practice of Simplicity
Designing Web Usability: The Practice of Simplicity
Neural Networks for Web Content Filtering
IEEE Intelligent Systems
Automating Content Extraction of HTML Documents
World Wide Web
The past, present and future of web information retrieval
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Microformats: The Next (Small) Thing on the Semantic Web?
IEEE Internet Computing
Microformats: a pragmatic path to the semantic web
Proceedings of the 15th international conference on World Wide Web
Visualization for Information Retrieval (The Information Retrieval Series)
Visualization for Information Retrieval (The Information Retrieval Series)
Web Page Filtering for Domain Ontology with the Context of Concept
IEICE - Transactions on Information and Systems
Accelerating Web Content Filtering by the Early Decision Algorithm
IEICE - Transactions on Information and Systems
International Journal of Organizational and Collective Intelligence
Hi-index | 0.00 |
Retrieving information from Internet is a difficult task as it is demonstrated by the lack of real-time tools able to extract information from webpages. The main cause is that most webpages in Internet are implemented using plain (X)HTML which is a language that lacks structured semantic information. For this reason much of the efforts in this area have been directed to the development of techniques for URLs extraction. This field has produced good results implemented by modern search engines. But, contrarily, extracting information from a single webpage has produced poor results or very limited tools. In this work we define a novel technique for information extraction from single webpages or collections of interconnected webpages. This technique is based on DOM distances to retrieve information. This allows the technique to work with any webpage and, thus, to retrieve information online. Our implementation and experiments demonstrate the usefulness of the technique.