Automatic document metadata extraction using support vector machines
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Automatic extraction of titles from general documents using machine learning
Information Processing and Management: an International Journal
SciPlore Xtract: extracting titles from scientific PDF documents by analyzing style information
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Docear: an academic literature suite for searching, organizing and creating academic literature
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Evaluation of header metadata extraction approaches and tools for scientific PDF documents
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
In this demo-paper we present Docear's PDF Inspector (DPI). DPI extracts titles from academic PDF files by applying a simple heuristic: the largest text on the first page of a PDF is assumed to be the title. This simple heuristic achieves accuracies around 70% and outperforms the tools ParsCit and SciPlore Xtract in both run-time and accuracy. In addition, DPI is released under the free open source license GPL 2+ at http://www.docear.org, written in JAVA, and runs on any major operating system.