Approaches to passage retrieval in full text information systems
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval of structured documents
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient retrieval of partial documents
TREC-2 Proceedings of the second conference on Text retrieval conference
Automatic text decomposition using text segments and text themes
Proceedings of the the seventh ACM conference on Hypertext
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Effective ranking with arbitrary passages
Journal of the American Society for Information Science and Technology
Enhanced topic distillation using text, markup tags, and hyperlinks
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
WWW '03 Proceedings of the 12th international conference on World Wide Web
HTML Page Analysis Based on Visual Cues
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Engineering a multi-purpose test collection for web retrieval experiments
Information Processing and Management: an International Journal
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Hierarchical clustering of WWW image search results using visual, textual and link information
Proceedings of the 12th annual ACM international conference on Multimedia
Finding the boundaries of information resources on the web
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
As we may perceive: inferring logical documents from hypertext
Proceedings of the sixteenth ACM conference on Hypertext and hypermedia
2D Conditional Random Fields for Web information extraction
ICML '05 Proceedings of the 22nd international conference on Machine learning
A Novel Context Matching Based Technique for Web Document Retrieval
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Template detection for large scale search engines
Proceedings of the 2006 ACM symposium on Applied computing
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic extraction of dynamic record sections from search engine result pages
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Sampling, information extraction and summarisation of hidden web databases
Data & Knowledge Engineering - Special issue: WIDM 2004
Clustering and searching WWW images using link and page layout analysis
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Homepage live: automatic block tracing for web personalization
Proceedings of the 16th international conference on World Wide Web
Proceedings of the 16th international conference on World Wide Web
Context-based Hierarchical Clustering for the Ontology Learning
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Computing block importance for searching on web sites
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
CCReSD: concept-based categorisation of Hidden Web databases
International Journal of High Performance Computing and Networking
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
The Journal of Machine Learning Research
A densitometric approach to web page segmentation
Proceedings of the 17th ACM conference on Information and knowledge management
Academic conference homepage understanding using constrained hierarchical conditional random fields
Proceedings of the 17th ACM conference on Information and knowledge management
Closing the loop in webpage understanding
Proceedings of the 17th ACM conference on Information and knowledge management
Webpage understanding: beyond page-level search
ACM SIGMOD Record
Incorporating site-level knowledge to extract structured data from web forums
Proceedings of the 18th international conference on World wide web
Extracting article text from the web with maximum subsequence segmentation
Proceedings of the 18th international conference on World wide web
Document relevance assessment via term distribution analysis using fourier series expansion
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Personalized retrieval of sports video based on multi-modal analysis and user preference acquisition
Multimedia Tools and Applications
Mining employment market via text block detection and adaptive cross-domain information extraction
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Improve web search using image snippets
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Exploiting image contents in web search
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Entropy-Based Visual Tree Evaluation on Block Extraction
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Clustering-based relevance feedback for web pages
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Utilizing passage-based language models for ad hoc document retrieval
Information Retrieval
Word distribution analysis for relevance ranking and query expansion
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Pattern-based extraction of addresses from web page content
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Utilizing passage-based language models for document retrieval
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Web page DOM node characterization and its application to page segmentation
IMSAA'09 Proceedings of the 3rd IEEE international conference on Internet multimedia services architecture and applications
Utilizing inter-passage and inter-document similarities for reranking search results
ACM Transactions on Information Systems (TOIS)
A comparison of discriminative classifiers for web news content extraction
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
A site oriented method for segmenting web pages
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Block-based similarity search on the web using manifold-ranking
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Block-based language modeling approach towards web search
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
When a few highly relevant answers are enough
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Identifying Web Spam with the Wisdom of the Crowds
ACM Transactions on the Web (TWEB)
On generating content and structural annotated websites using conceptual modeling
ER'06 Proceedings of the 25th international conference on Conceptual Modeling
VisHue: web page segmentation for an improved query interface for medlineplus medical encyclopedia
DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
Measuring web page similarity based on textual and visual properties
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Modeling higher-order term dependencies in information retrieval using query hypergraphs
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Decision making aid in mobile environment by behavioral characteristic
Proceedings of the 13th International Conference on Electronic Commerce
Effectiveness of template detection on noise reduction and websites summarization
Information Sciences: an International Journal
Assessing the quality of textual features in social media
Information Processing and Management: an International Journal
Extracting information from google fusion tables
Search Computing
Hi-index | 0.00 |
Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentation algorithms to partition web pages into blocks and investigate how to take advantage of block-level evidence to improve retrieval performance in the web context. Because of the special characteristics of web pages, different page segmentation method will have different impact on web search performance. We compare four types of methods, including fixed-length page segmentation, DOM-based page segmentation, vision-based page segmentation, and a combined method which integrates both semantic and fixed-length properties. Experiments on block-level query expansion and retrieval are performed. Among the four approaches, the combined method achieves the best performance for web search. Our experimental results also show that such a semantic partitioning of web pages effectively deals with the problem of multiple drifting topics and mixed lengths, and thus has great potential to boost up the performance of current web search engines.