Enhanced topic distillation using text, markup tags, and hyperlinks
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model
IEEE Transactions on Knowledge and Data Engineering
Page-level template detection via isotonic smoothing
Proceedings of the 16th international conference on World Wide Web
A graph-theoretic approach to webpage segmentation
Proceedings of the 17th international conference on World Wide Web
A densitometric approach to web page segmentation
Proceedings of the 17th ACM conference on Information and knowledge management
Webpage understanding: beyond page-level search
ACM SIGMOD Record
An evidential approach to query interface matching on the deep Web
Information Systems
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Visual structure-based web page clustering and retrieval
Proceedings of the 19th international conference on World wide web
A segmentation method for web page analysis using shrinking and dividing
International Journal of Parallel, Emergent and Distributed Systems - Network and parallel computing
A site oriented method for segmenting web pages
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hi-index | 0.00 |
World Wide Web has become the largest source of information. Consequently web based information retrieval, information extraction; automatic page adaptation and querying deep-web are gaining importance. The need for information retrieval applications is increasing. To address the problems of the ever expanding information over the internet, traditional information retrieval techniques have been applied. Such techniques are sometimes time consuming, and laborious, and the results obtained may be unsatisfactory. This study is an attempt to query web pages like MedlinePlus medical encyclopedia by segmenting the web pages. It summarizes the existing approaches for web page segmentation from the perspective of "structure realization for improved querying" on the web. It proposes a new algorithm VisHue for web page segmentation based on visual cues and heuristics and further uses the hierarchical structure generated by it to develop the Query by Segment or Tag (QBT) query interface. This interface is close to the end-user and exploits the relationships among the various content groups within a web page. Such an improved query-interface enables the user to perform in-depth querying. It is a step beyond the page-level search.