Techniques for improving web retrieval effectiveness

Authors:
Eui-Kyu Park;Dong-Yul Ra;Myung-Gil Jang
Affiliations:
Computer Science Department, Yonsei University, Wonju, Kangwon 220-710, Korea;Computer Science Department, Yonsei University, Wonju, Kangwon 220-710, Korea;Speech/Language Information Research Department, ETRL Yuseong-gu, Daejeon 305-350, Korea
Venue:
Information Processing and Management: an International Journal
Year:
2005

Citing 15
Cited 2

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Passage retrieval revisited

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
The TREC conferences

Readings in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Experimentation as a way of life: Okapi at TREC

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Natural language information retrieval: progress report

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Searching the Web by constrained spreading activation

Information Processing and Management: an International Journal
A vector space model for automatic indexing

Communications of the ACM
Retrieval effectiveness on the web

Information Processing and Management: an International Journal
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The Importance of Prior Probabilities for Entry Page Search

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Engineering a multi-purpose test collection for web retrieval experiments

Information Processing and Management: an International Journal

Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
Designing an architecture for improving web query processing in heterogeneous databases access

Proceedings of the International Conference on Web Intelligence, Mining and Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper talks about several schemes for improving retrieval effectiveness that can be used in the named page finding tasks of web information retrieval (Overview of the TREC-2002 web track. In: Proceedings of the Eleventh Text Retrieval Conference TREC-2002, NIST Special Publication #500-251, 2003). These methods were applied on top of the basic information retrieval model as additional mechanisms to upgrade the system. Use of the title of web pages was found to be effective. It was confirmed that anchor texts of incoming links was beneficial as suggested in other works. Sentence-query similarity is a new type of information proposed by us and was identified to be the best information to take advantage of. Stratifying and re-ranking the retrieval list based on the maximum count of index terms in common between a sentence and a query resulted in significant improvement of performance. To demonstrate these facts a large-scale web information retrieval system was developed and used for experimentation.