Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
Journal of the American Society for Information Science - Special issue: relevance research
Collection statistics for fast duplicate document detection
ACM Transactions on Information Systems (TOIS)
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Information Retrieval: Algorithms and Heuristics
Information Retrieval: Algorithms and Heuristics
RoadRunner: automatic data extraction from data-intensive web sites
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Using micro information units for internet search
Proceedings of the eleventh international conference on Information and knowledge management
Discovery of Frequent Word Sequences in Text
Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Evaluation of filtering current news search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Template detection for large scale search engines
Proceedings of the 2006 ACM symposium on Applied computing
Incremental web page template detection
Proceedings of the 17th international conference on World Wide Web
Tuning up FOIL for extracting information from the web
International Journal of Computer Applications in Technology
Extracting article text from the web with maximum subsequence segmentation
Proceedings of the 18th international conference on World wide web
Web document text and images extraction using DOM analysis and natural language processing
Proceedings of the 9th ACM symposium on Document engineering
Health: related information structuring for the semantic web
Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications
News information extraction based on adaptive weighting using unsupervised Bayesian algorithm
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
A hybrid approach for extracting informative content from web pages
Information Processing and Management: an International Journal
Hi-index | 0.00 |
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the retrieval precision for the queries that generate irrelevant results. We believe that by reducing the number of irrelevant results; the users are encouraged to go back to a given site to search. Our experimental results on several different web sites and on the whole cnnfn collection demonstrate the feasibility of our approach.