Using position, fonts and cited references to retrieve scientific documents

Authors:
Yen-Liang Chen;Li-Chen Cheng;Yun-Ling Cheng
Affiliations:
Department of Information Management, National CentralUniversity, Taiwan, R.O.C.;Department of Information Management, National CentralUniversity, Taiwan, R.O.C.;Department of Information Management, National CentralUniversity, Taiwan, R.O.C.
Venue:
Journal of Information Science
Year:
2007

Citing 24
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Information retrieval using a singular value decomposition model of latent semantic structure

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Generalized vector spaces model in information retrieval

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Trailblazing the literature of hypertext: author co-citation analysis (1989–1998)

Proceedings of the tenth ACM Conference on Hypertext and hypermedia : returning to our diverse roots: returning to our diverse roots
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Visualizing science by citation mapping

Journal of the American Society for Information Science
An algorithmic framework for performing collaborative filtering

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Matrices, Vector Spaces, and Information Retrieval

SIAM Review
Modern Information Retrieval

Modern Information Retrieval
An information retrieval model based on vector space method by supervised learning

Information Processing and Management: an International Journal
Introduction to the special issue on summarization

Computational Linguistics - Summarization
Summarizing scientific articles: experiments with relevance and rhetorical status

Computational Linguistics - Summarization
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering and Identifying Temporal Trends in Document Databases

ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
New Feature Sets for Summarization by Sentence Extraction

IEEE Intelligent Systems
Evaluation of importance of sentences based on connectivity to title

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
The myth of the double-blind review?: author identification using only citations

ACM SIGKDD Explorations Newsletter
Temporal document retrieval model for business news archives

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
A framework for understanding latent semantic indexing (LSI) performance

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Machine-made index for technical literature: an experiment

IBM Journal of Research and Development

A multi-faceted and automatic knowledge elicitation system (MAKES) for managing unstructured information

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

As more and more documents become available on the internet, finding documents that fit users' needs from databases containing millions of documents is becoming increasingly important. Since a scientific document is a structured text, it has some useful features that can be used to improve retrieval performance. In this work, we investigate three such features: fonts, position and cited references. While past research has used these three features individually to improve document searching, no existing research discusses how to integrate these three together to improve retrieval performance. This work first investigates the relationships among them, and then uses these three features to design a novel retrieval method based on the discovered relationships. Extensive experiments have been carried out with real scientific documents to show its effectiveness. Our empirical results show that using the location factor alone achieves the same performance as considering location and font factors simultaneously. We also observed that citation similarity is useful only when the similarity is high. Based on these two clues, we developed a method to combine the content vector and reference vector conditionally, and as a result, this integrated approach does, indeed, improve search performance.