SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning for text categorization and information extraction with ILP
Learning language in logic
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
IEEE Intelligent Systems
Design and development of data-intensive web sites: The Araneus approach
ACM Transactions on Internet Technology (TOIT)
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Wrapper Generation for Web Accessible Data Sources
COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A Conceptual-Modeling Approach to Extracting Data from the Web
ER '98 Proceedings of the 17th International Conference on Conceptual Modeling
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A Fully Automated Object Extraction System for the World Wide Web
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Relational learning techniques for natural language information extraction
Relational learning techniques for natural language information extraction
Machine learning for information extraction in informal domains
Machine learning for information extraction in informal domains
Adaptive information extraction: core technologies for information agents
Intelligent information agents
Hi-index | 0.00 |
This paper proposes the web information extraction system that extracts the pre-defined information automatically from web documents (i.e. HTML documents) and integrates the extracted information. The system recognizes entities without labels by the probabilistic based entity recognition method and extends the existing domain knowledge semiautomatically by using the extracted data. Moreover, the system extracts the sub-linked information linked to the basic page and integrates the similar results extracted from heterogeneous sources. The experimental result shows that the global precision of seven domain sites is 93.5%. The system using the sub-linked information and the probabilistic based entity recognition enhances the precision significantly against the system using only the domain knowledge. Moreover, the presented system can extract the more various information precisely due to applying the system with flexibility according to domains. Thus, the system can increase the degree of user satisfaction at its maximum and contribute the revitalization of e-business.