A metadata and annotation extractor from PDF document for semantic web

Authors:
Archana Shukla
Affiliations:
Institute of Technology, Allahabad, Uttar Pradesh, India
Venue:
Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Year:
2010

Citing 2
Cited 0

Challenges in evaluating summaries of short stories

SumQA '06 Proceedings of the Workshop on Task-Focused Summarization and Question Answering
Summarizing short stories

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research scholars undertake literature survey to identify and problem which they would like to address and possible solutions. As the part of this activity, they download research papers from internet, read them and write comments, observations, explanations or questions either on a separate sheet of a paper or on the paper itself. They use these notes and observations to firm up their understanding of research domain and to define their research problems. These notes and observations are very valuable knowledge asset for the research. My work is motivated by a desire to capture and to make it available to the community of research scholars, so that they can be benefited from them. In this paper, I present an editor which facilitates authoring annotations on PDF documents. I have designed a DTD (Document Type Definition) for annotation document. This DTD contains identity of annotation Author, identity of the paper on which annotation will be created, Type of annotation, Comment and Date_time elements. This type field is of enumeration type and may take a value "note", "comment", "insert", "help", "paragraph". "insert" is used to state that the annotation is not on the original PDF document but it is on another annotation. My tool provides a user-friendly interface to query these annotations on PDF document, to classify document on the basis of number of comments and also the relationships between annotations. My tool also extracts metadata from the PDF document. This metadata includes title, author, keywords, summary and date_time. This tool has been implemented using API of java PDF Box.