Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Snowball: a prototype system for extracting relations from large text collections
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Extracting relational data from HTML repositories
ACM SIGKDD Explorations Newsletter
Extracting XML data from the web
Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.00 |
In recent years, the research of record extraction from large document data is becoming popular. However there still exist some problems in record extraction. 1) when large document data is used for the target of information extraction, the process usually becomes very expensive. 2) it is also likely that extracted records may not pertain to the user's interest on the aspect of the topic. To address these problems, in this paper we propose a method to efficiently extract those records whose topics agree with the user's interest. To improve the efficiency of the information extraction system, our method identifies documents from which useful records are probably extracted. We make use of user feed-back on extraction results to find topic-related documents and records. Our experiments show that our system achieves high extraction accuracy across different extraction targets.