Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Information retrieval using a singular value decomposition model of latent semantic structure
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Generalized vector spaces model in information retrieval
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Trailblazing the literature of hypertext: author co-citation analysis (1989–1998)
Proceedings of the tenth ACM Conference on Hypertext and hypermedia : returning to our diverse roots: returning to our diverse roots
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Visualizing science by citation mapping
Journal of the American Society for Information Science
An algorithmic framework for performing collaborative filtering
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Computer Evaluation of Indexing and Text Processing
Journal of the ACM (JACM)
New Methods in Automatic Extracting
Journal of the ACM (JACM)
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Modern Information Retrieval
An information retrieval model based on vector space method by supervised learning
Information Processing and Management: an International Journal
Introduction to the special issue on summarization
Computational Linguistics - Summarization
Summarizing scientific articles: experiments with relevance and rhetorical status
Computational Linguistics - Summarization
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering and Identifying Temporal Trends in Document Databases
ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
New Feature Sets for Summarization by Sentence Extraction
IEEE Intelligent Systems
Evaluation of importance of sentences based on connectivity to title
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
The myth of the double-blind review?: author identification using only citations
ACM SIGKDD Explorations Newsletter
Temporal document retrieval model for business news archives
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
A framework for understanding latent semantic indexing (LSI) performance
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Machine-made index for technical literature: an experiment
IBM Journal of Research and Development
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
As more and more documents become available on the internet, finding documents that fit users' needs from databases containing millions of documents is becoming increasingly important. Since a scientific document is a structured text, it has some useful features that can be used to improve retrieval performance. In this work, we investigate three such features: fonts, position and cited references. While past research has used these three features individually to improve document searching, no existing research discusses how to integrate these three together to improve retrieval performance. This work first investigates the relationships among them, and then uses these three features to design a novel retrieval method based on the discovered relationships. Extensive experiments have been carried out with real scientific documents to show its effectiveness. Our empirical results show that using the location factor alone achieves the same performance as considering location and font factors simultaneously. We also observed that citation similarity is useful only when the similarity is high. Based on these two clues, we developed a method to combine the content vector and reference vector conditionally, and as a result, this integrated approach does, indeed, improve search performance.