Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
RoadRunner: automatic data extraction from data-intensive web sites
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting unstructured data from template generated web documents
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
Template detection for large scale search engines
Proceedings of the 2006 ACM symposium on Applied computing
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features
ACM Transactions on Internet Technology (TOIT)
Extracting Web Data Using Instance-Based Learning
World Wide Web
Mining templates from search result records of search engines
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint optimization of wrapper generation and template detection
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Data & Knowledge Engineering
Extracting article text from the web with maximum subsequence segmentation
Proceedings of the 18th international conference on World wide web
Extracting data records from the web using tag path clustering
Proceedings of the 18th international conference on World wide web
News article extraction with template-independent wrapper
Proceedings of the 18th international conference on World wide web
Can we learn a template-independent wrapper for news article extraction from a single training site?
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining employment market via text block detection and adaptive cross-domain information extraction
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Web Data Extraction Based on Label Library
CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 05
Knowledge Discovery Enhanced with Semantic and Social Information
Knowledge Discovery Enhanced with Semantic and Social Information
Coupled semi-supervised learning for information extraction
Proceedings of the third ACM international conference on Web search and data mining
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Hi-index | 0.00 |
Information extraction is important in web information retrieval. In case of news information extraction, because news information does not have representative keywords pointing out its beginning and ending, it is difficult to specify the news title and body automatically. Our approach is based on an adaptive weighting factor using Bayesian algorithm to solve this problem. We divided a news page into text fragments, and represented them with a set of content features and layout features. We used an adaptive weighting factor to make features fit in different pages. Experiments show that our method results in a higher precision than the original algorithm without a weighting factor on the task of news information extraction.