Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Communications of the ACM
Database techniques for the World-Wide Web: a survey
ACM SIGMOD Record
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Annotating information structures in Chinese texts using HowNet
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
L-tree match: a new data extraction model and algorithm for huge text stream with noises
Journal of Computer Science and Technology
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
WetDL: a web information extraction language
ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Ontology-driven information extraction with ontosyphon
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Towards knowledge acquisition from information extraction
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Integrating data from the web by machine-learning tree-pattern queries
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Free-text search versus complex web forms
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Free-text search over complex web forms
IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Hi-index | 0.00 |
Since Web resources are formatted in diverse ways for human viewing, the accuracy of extracting information is not satisfactory and, further, it is not convenient for users to query information extracted by traditional techniques. This paper proposes WebKER, a wrapper-driven system for extracting knowledge from Web pages in Chinese based on domain ontologies. Wrappers are first learned through suffix arrays. Based on HowNet, a novel approach is proposed to automatically align the raw data extracted by wrappers. Then knowledge is generated and described with Resource Description Framework (RDF) statements. After merged, knowledge is finally added to the Knowledge Base (KB). A prototype of WebKER is implemented and in the experiments, the performance of our system and the comparison between querying information stored in the KB and querying information extracted with traditional techniques are given, indicating the superiority of our system. In addition, the evaluation of the outstanding wrapper and the method for merging knowledge are also presented.