News information extraction based on adaptive weighting using unsupervised Bayesian algorithm

Authors:
Shilin Huang;Xiaolin Zheng;Xiaowei Wang;Deren Chen
Affiliations:
College of Computer Science and Technology, Zhejiang University, Hangzhou, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Venue:
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Year:
2011

Citing 24
Cited 0

Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
RoadRunner: automatic data extraction from data-intensive web sites

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Hierarchical Wrapper Induction for Semistructured Information Sources

Autonomous Agents and Multi-Agent Systems
Discovering informative content blocks from Web documents

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data records in Web pages

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting unstructured data from template generated web documents

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Web data extraction based on partial tree alignment

WWW '05 Proceedings of the 14th international conference on World Wide Web
ViPER: augmenting automatic information extraction with visual perceptions

Proceedings of the 14th ACM international conference on Information and knowledge management
Template detection for large scale search engines

Proceedings of the 2006 ACM symposium on Applied computing
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features

ACM Transactions on Internet Technology (TOIT)
Information Extraction from Web Pages Using Presentation Regularities and Domain Knowledge

World Wide Web
Extracting Web Data Using Instance-Based Learning

World Wide Web
Mining templates from search result records of search engines

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint optimization of wrapper generation and template detection

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An unsupervised method for joint information extraction and feature mining across different Web sites

Data & Knowledge Engineering
Extracting article text from the web with maximum subsequence segmentation

Proceedings of the 18th international conference on World wide web
Extracting data records from the web using tag path clustering

Proceedings of the 18th international conference on World wide web
News article extraction with template-independent wrapper

Proceedings of the 18th international conference on World wide web
Can we learn a template-independent wrapper for news article extraction from a single training site?

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining employment market via text block detection and adaptive cross-domain information extraction

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Web Data Extraction Based on Label Library

CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 05
Knowledge Discovery Enhanced with Semantic and Social Information

Knowledge Discovery Enhanced with Semantic and Social Information
Coupled semi-supervised learning for information extraction

Proceedings of the third ACM international conference on Web search and data mining
Extracting content structure for web pages based on visual representation

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information extraction is important in web information retrieval. In case of news information extraction, because news information does not have representative keywords pointing out its beginning and ending, it is difficult to specify the news title and body automatically. Our approach is based on an adaptive weighting factor using Bayesian algorithm to solve this problem. We divided a news page into text fragments, and represented them with a set of content features and layout features. We used an adaptive weighting factor to make features fit in different pages. Experiments show that our method results in a higher precision than the original algorithm without a weighting factor on the task of news information extraction.