Preserving Digital Information

Summary of a 2007 book

ISBN 3-540-37886-3

© 2006-7, H. M. Gladney
HMG Consulting

3  May  2007

Almost all new information is created with digital tools.  Much of this is distributed primarily in digital form, partly because its analog representations lose information contained in digital sources.  This recent shift—part of the Information Revolution—creates demand for new functionality that includes means for preserving reliably authentic versions of authors’ works—versions that will be intelligible and/or otherwise usable many years from now.

It will surprise no one that we can solve a technical challenge with further technology.  Preserving Digital Information describes a complete technical solution for preserving any digital information whatsoever.[1]

What are the challenges for preserving a digital work and how can we handle them?  We package each work of interest into a bundle containing several representations and critical provenance information, addressing key requirements as represented by the following table.

We must ensure that:

How we can accomplish this:

Some copy survives as long as wanted

Replicating each document package in independent sites, and including the document’s original form in the package

And any user can find a copy

Using widely available search tools

With sufficient authenticity evidence

Binding each document with its provenance information and sealing the package with cryptographic signatures

And can use it as its producer intended

Using ISO standards to represent simple objects and
durably intelligible encoding for other objects,
[2] saving
in as many renderings as might help future users

With sufficient contextual information.

With reliable references using universally unique digital object identifiers.

The book describes how to realize these features in the structure of a Trustworthy Digital Object (TDO) (see Figure below) and in procedures for generating and inspecting TDOs.  That the solution is complete and economical is justified with precise language based on early 20th-century epistemology.[3]

The answer to “How can we preserve digital information?” is, of course, different from that for “How should we manage digital archives?”  The archival services needed are available in any of many current digital library offerings.[4]

Fig 32 from Preserving Digital Information:
cryptographically-sealed TDO  with reliable references to essential contex.

 



[1]       A more complete synopsis and table of contents is accessible at http://home.pacbell.net/hgladney/PDIf.pdf.  A slide show summary is available at http://home.pacbell.net/hgladney/PDIslides.pdf. 

[2]       Using a simple Universal Virtual Computer based on the Church-Turing thesis.

[3]       For instance, we must avoid confusions about “dynamic objects” and about “the original”, and need objectively testable definitions for “authentic” and for relationships among versions of a work.

[4]       Modest and obvious software extensions might be needed for some content management packages.