Converting PDF to HTML approach based on text detection

  • Authors:
  • Deliang Jiang;Xiaohu Yang

  • Affiliations:
  • Zhejiang University, HangZhou, China;ZheJiang University, HangZhou, China

  • Venue:
  • Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Converting PDF document to HTML document with the same layout format is a very important and interesting research problem. After the conversion, it is easy for PDF document to be browsed online and information extracted. Based on the extraction result of the PDF document of the open source tool PDFBox, the paper described a method that can detect the layout information of the PDF document and convert the PDF document to HTML page effectively.