Building Digital Government by XML
HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 5 - Volume 05
Development of a Universal Virtual Computer (UVC) for long-term preservation of digital objects
Journal of Information Science
Trustworthy 100-year digital objects: durable encoding for when it's too late to ask
ACM Transactions on Information Systems (TOIS)
Managing information extraction: state of the art and research directions
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A transducer-based XML query processor
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Wikify!: linking documents to encyclopedic knowledge
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Long, often quite boring, notes of meetings
Proceedings of the WSDM '09 Workshop on Exploiting Semantic Annotations in Information Retrieval
Who said what to whom?: capturing the structure of debates
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Non-interactive OCR post-correction for giga-scale digitization projects
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Focused retrieval and result aggregation with political data
Information Retrieval
Succinct summaries of narrative events using social networks
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Hi-index | 0.00 |
Scanned and OCRed data leads to large file sizes if facsimile images are included. This makes storage of, and providing online access to large data sets costly. Manually analyzing such data is cumbersome because of long download and processing times. It may thus be advantageous to reconstruct the scanned documents as documents without scanned images which nevertheless closely resemble the original. We have done this reconstruction for a data set of Dutch parliamentary proceedings with positive results. 1.5% of the original storage space was needed, while the documents resembled the originals to a high degree. We describe the reconstruction process and evaluate the costs, the benefits and the quality.