CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
Persistence of information on the web: analyzing citations contained in research articles
Proceedings of the ninth international conference on Information and knowledge management
Automatic document metadata extraction using support vector machines
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Automatic extraction of titles from general documents using machine learning
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Structuring documents according to their table of contents
Proceedings of the 2005 ACM symposium on Document engineering
A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Automatic and versatile publications ranking for research institutions and scholars
Communications of the ACM - Smart business networks
Book search: indexing the valuable parts
Proceedings of the 2008 ACM workshop on Research advances in large digital book repositories
On tables of contents and how to recognize them
International Journal on Document Analysis and Recognition
Google book search: Citation analysis for social science and the humanities
Journal of the American Society for Information Science and Technology
Analysis of Book Documents' Table of Content Based on Clustering
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Popularity weighted ranking for academic digital libraries
ECIR'07 Proceedings of the 29th European conference on IR research
Book search experiments: investigating IR methods for the indexing and retrieval of books
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Proceedings of the 10th annual joint conference on Digital libraries
Table of contents recognition for converting PDF documents in e-book formats
Proceedings of the 10th ACM symposium on Document engineering
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Structure extraction from PDF-based book documents
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Overview of the INEX 2010 book track: scaling up the evaluation using crowdsourcing
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Assessing the citation impact of books: The role of Google Books, Google Scholar, and Scopus
Journal of the American Society for Information Science and Technology
Challenges in generating bookmarks from TOC entries in e-books
Proceedings of the 2012 ACM symposium on Document engineering
Proceedings of the 3rd Annual ACM Web Science Conference
Social book search: comparing topical relevance judgements and book suggestions for evaluation
Proceedings of the 21st ACM international conference on Information and knowledge management
Can back-of-the-book indexes be automatically created?
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Academic search engines and digital libraries provide convenient online search and access facilities for scientific publications. However, most existing systems do not include books in their collections although several books are freely available online. Academic books are different from papers in terms of their length, contents and structure. We argue that accounting for academic books is important in understanding and assessing scientific impact. We introduce an open-book search engine that extracts and indexes metadata, contents, and bibliography from online PDF book documents. To the best of our knowledge, no previous work gives a systematical study on building a search engine for books. We propose a hybrid approach for extracting title and authors from a book that combines results from CiteSeer, a rule based extractor, and a SVM based extractor, leveraging web knowledge. For "table of contents" recognition, we propose rules based on multiple regularities based on numbering and ordering. In addition, we study bibliography extraction and citation parsing for a large dataset of books. Finally, we use the multiple fields available in books to rank books in response to search queries. Our system can effectively extract metadata and contents from large collections of online books and provides efficient book search and retrieval facilities.