The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Function-based object model towards website adaptation
Proceedings of the 10th international conference on World Wide Web
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
WWW '03 Proceedings of the 12th international conference on World Wide Web
DOM-based content extraction of HTML documents
WWW '03 Proceedings of the 12th international conference on World Wide Web
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic browsing of large pictures on mobile devices
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Solving multiclass learning problems via error-correcting output codes
Journal of Artificial Intelligence Research
Web page cleaning for web mining through feature weighting
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Editorial: special issue on web content mining
ACM SIGKDD Explorations Newsletter
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Object-level ranking: bringing order to Web objects
WWW '05 Proceedings of the 14th international conference on World Wide Web
Browsing fatigue in handhelds: semantic bookmarking spells relief
WWW '05 Proceedings of the 14th international conference on World Wide Web
The volume and evolution of web page templates
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A study on combination of block importance and relevance to estimate page relevance
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Understanding the function of web elements for mobile content delivery using random walk models
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Title extraction from bodies of HTML documents and its application to web page retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A testing framework for Web application security assessment
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web security
Recovering semantic relations from web pages based on visual cues
Proceedings of the 11th international conference on Intelligent user interfaces
Learning Object Models from Semistructured Web Documents
IEEE Transactions on Knowledge and Data Engineering
Model-directed web transactions under constrained modalities
Proceedings of the 15th international conference on World Wide Web
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
MyPortal: robust extraction and aggregation of web content
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The portrait of a common HTML web page
Proceedings of the 2006 ACM symposium on Document engineering
Structured Data Extraction from the Web Based on Partial Tree Alignment
IEEE Transactions on Knowledge and Data Engineering
A fast and robust method for web page template detection and removal
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A system for query-specific document summarization
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A comparative study on classifying the functions of web page blocks
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Sampling, information extraction and summarisation of hidden web databases
Data & Knowledge Engineering - Special issue: WIDM 2004
Clustering and searching WWW images using link and page layout analysis
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Web page title extraction and its application
Information Processing and Management: an International Journal
Homepage live: automatic block tracing for web personalization
Proceedings of the 16th international conference on World Wide Web
Csurf: a context-driven non-visual web-browser
Proceedings of the 16th international conference on World Wide Web
Page-level template detection via isotonic smoothing
Proceedings of the 16th international conference on World Wide Web
Context browsing with mobiles - when less is more
Proceedings of the 5th international conference on Mobile systems, applications and services
Recognition of Pornographic Web Pages by Classifying Texts and Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Model-directed Web transactions under constrained modalities
ACM Transactions on the Web (TWEB)
Computing block importance for searching on web sites
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Towards a unified approach to document similarity search using manifold-ranking of blocks
Information Processing and Management: an International Journal
Enhancing web page classification through image-block importance analysis
Information Processing and Management: an International Journal
Efficient web browsing on small screens
AVI '08 Proceedings of the working conference on Advanced visual interfaces
Improving Web search using image snippets
ACM Transactions on Internet Technology (TOIT)
Site-Independent Template-Block Detection
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
S-SimRank: Combining Content and Link Information to Cluster Papers Effectively and Efficiently
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
The Journal of Machine Learning Research
Automated Semantic Analysis of Schematic Data
World Wide Web
ManyAspects: a system for highlighting diverse concepts in documents
Proceedings of the VLDB Endowment
Granular modeling of web documents: impact on information retrieval systems
Proceedings of the 10th ACM workshop on Web information and data management
Webpage understanding: beyond page-level search
ACM SIGMOD Record
Bridging the Web Accessibility Divide
Electronic Notes in Theoretical Computer Science (ENTCS)
An Informative DOM Subtree Identification Method from Web Pages in Unfamiliar Web Sites
IEICE - Transactions on Information and Systems
On Finding Templates on Web Collections
World Wide Web
Can we learn a template-independent wrapper for news article extraction from a single training site?
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Extract Web News Title in Template Independent Way
RSKT '09 Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology
Improve web search using image snippets
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Indexing by permeability in block structured web pages
Proceedings of the 9th ACM symposium on Document engineering
Template-independent news extraction based on visual consistency
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Entropy-Based Visual Tree Evaluation on Block Extraction
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
A testing framework for Web application security assessment
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web security
Enhancing web page readability for non-native readers
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Using visual pages analysis for optimizing web archiving
Proceedings of the 2010 EDBT/ICDT Workshops
Clustering-based relevance feedback for web pages
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Automatic document structure detection for data integration
BIS'07 Proceedings of the 10th international conference on Business information systems
Improve ranking by using image information
ECIR'07 Proceedings of the 29th European conference on IR research
Analysis of web page complexity through visual segmentation
HCI'07 Proceedings of the 12th international conference on Human-computer interaction: applications and services
Blog post and comment extraction using information quantity of web format
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
The research of optimization of browse efficiency based on web information on small-screen
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 3
Web page DOM node characterization and its application to page segmentation
IMSAA'09 Proceedings of the 3rd IEEE international conference on Internet multimedia services architecture and applications
On detection of contextual advertisements
CAR'10 Proceedings of the 2nd international Asia conference on Informatics in control, automation and robotics - Volume 2
Automatically documenting program changes
Proceedings of the IEEE/ACM international conference on Automated software engineering
Automatic selection of print-worthy content for enhanced web page printing experience
Proceedings of the 10th ACM symposium on Document engineering
Vi-DIFF: understanding web pages changes
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Expert Systems with Applications: An International Journal
Evaluating the visual quality of web pages using a computational aesthetic approach
Proceedings of the fourth ACM international conference on Web search and data mining
Identifying primary content from web pages and its application to web search ranking
Proceedings of the 20th international conference companion on World wide web
Generalized link suggestions via web site clustering
Proceedings of the 20th international conference on World wide web
Archiving the web using page changes patterns: a case study
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
A site oriented method for segmenting web pages
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
DOM based content extraction via text density
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Ranking search results by web quality dimensions
Journal of Web Engineering
Automating the selection of stories for AI in the news
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Block-based similarity search on the web using manifold-ranking
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Exploiting link analysis with a three-layer web structure model
WISE'06 Proceedings of the 7th international conference on Web Information Systems
A heuristic approach for topical information extraction from news pages
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Block-based language modeling approach towards web search
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Identifying Web Spam with the Wisdom of the Crowds
ACM Transactions on the Web (TWEB)
User-centric adaptation of Web information for small screens
Journal of Visual Languages and Computing
Learning image manifold using web data
PCM'04 Proceedings of the 5th Pacific Rim Conference on Advances in Multimedia Information Processing - Volume Part II
Towards understanding the functions of web element
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Cleaning web pages for effective web content mining
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Lightweight automatic face annotation in media pages
Proceedings of the 21st international conference on World Wide Web
Automatically extracting user reviews from forum sites
Computers & Mathematics with Applications
Structural and visual comparisons for web page archiving
Proceedings of the 2012 ACM symposium on Document engineering
Turn the page: automated traversal of paginated websites
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Effectiveness of template detection on noise reduction and websites summarization
Information Sciences: an International Journal
Measuring the Visual Complexities of Web Pages
ACM Transactions on the Web (TWEB)
Enabling the transition to the mobile web with WebSieve
Proceedings of the 14th Workshop on Mobile Computing Systems and Applications
Efficient and effective information finding on small screen devices
Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility
A hybrid approach for extracting informative content from web pages
Information Processing and Management: an International Journal
Identifying salient entities in web pages
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. Also, it has been proven that differentiating noisy or unimportant blocks from pages can facilitate web mining, search and accessibility. However, no uniform approach and model has been presented to measure the importance of different segments in web pages. Through a user study, we found that people do have a consistent view about the importance of blocks in web pages. In this paper, we investigate how to find a model to automatically assign importance values to blocks in a web page. We define the block importance estimation as a learning problem. First, we use a vision-based page segmentation algorithm to partition a web page into semantic blocks with a hierarchical structure. Then spatial features (such as position and size) and content features (such as the number of images and links) are extracted to construct a feature vector for each block. Based on these features, learning algorithms are used to train a model to assign importance to different segments in the web page. In our experiments, the best model can achieve the performance with Micro-F1 79% and Micro-Accuracy 85.9%, which is quite close to a person's view.