Digital Document Quarterly

Perspectives on Trustworthy Information

Volume 5, Number 4, 4Q2006

 

 

 

DDQ Home

Citations

Glossary

HMG Consulting

Saratoga, CA 95070

©  2006, H.M. Gladney

 

ISSN: 1547-8610

Editorial Comment

DDQ 5(4) offers criticisms of apparently common scholarly practice—criticisms broader and more tenuous than have appeared in prior DDQ numbers.  These views suggest aspects that readers might reflect on.  To the extent that they seem inappropriate, I would be most interested in refutations, either as private correspondence or in the form of short arguments that DDQ could publish in later numbers.

Some of the content, particularly that about inattention across professional boundaries, is stimulated by missing evidence that I would expect to be present, as in the case of the dog’s barking in Arthur Conan Doyle’s The Adventure of Silver Blaze.  (What was significant to Sherlock Holmes was that the dog had not barked.) 

It seems to me that C.P. Snow’s Two Cultures difficulties are still with us,[1] and have impeded digital preservation progress.  From one side of the divide, I offer a perspective of differences of approach that have hampered productive collaboration.  My participation in a 1996 panel discussion stimulate this line of thinking.  Readers will see the adverse influence of The Two Cultures gap throughout the current DDQ number.

Digital Preservation

On Designated Communities

In December, TechWorld reported that the European Union has funded a multi-country digital preservation project called PLANETS (Preservation and Long-term Access through NETworked Services), and that a participants’ team has assembled itself.[2]  The DPE (Digital Preservation Europe) website reports a November PLANET partners’ meeting.

Among the topics discussed in this meeting, the importance and role of a collection’s “Designated Community” received attention that puzzles me.  This is not because anything said is surprising or controversial, but rather that it has long been normal practice for each library to identify such a community as part of its mission statement.  I would have been interested in explicit distinctions between traditional library practices and aspects that are new and challenging for digital collections, but none were emphasized in the meeting report.

NDIIPP at Mid-Point

“One of the chief tasks of NDIIPP is to identify and provide for all the barriers to progress in digital preservation. The most salient are those caused by the rapid changes in technology.  Frustrations are shared by industry and collecting institutions alike over the multiplicity of formats, rapid technological changes, and hardware and software obsolescence that plague the new information technologies.[3]

Recent reports remind us that the [U.S.] National Digital Information Infrastructure and Preservation Program (NDIIPP)[4] has reached its midpoint, suggesting that critical evaluations are appropriate, with a view to mid-point course changes.[5]  The next D-Lib Magazine number will contain my critique of technical aspects of NDIIPP reports.[6]  The abstract of Digital Preservation in a National Context: Questions and Views of an Outsider reads:

“This article draws attention to technical opportunities which, if pursued, would significantly accelerate National Digital Information Infrastructure Preservation Program (NDIIPP) progress towards objectives called for by the U.S. Congress.  It also identifies concerns about apparent content scope limitations of the NDIIPP plan.

“A solution is known in principle for every difficult technical problem of digital preservation, including all those identified in NDIIPP publications.  They and other works correctly assert that non-technical preservation challenges are greater than technical ones, but do not discuss using technology to reduce non-technical obstacles.  Available technical choices show that some apparent preservation challenges are not obstacles after all.

“If document representations and network protocols are standardized, then each archive can autonomously adapt itself to its own institutional environment.  Thinking about what end users will want led my colleagues and me to approach the challenge differently than most other authors.   We focus on information contributors and readers instead of on the work of repository employees.  We design document representations instead of new repository methodology.  We treat each repository as a “black box” whose internals can be adapted to local needs instead of discussing sharable repository.”

Compared to the pace of R&D progress expected in the private sector, at least in the Silicon Valley environment that I am familiar with, progress in the technical components of the NDIIPP project is disappointingly slow.[7]

Three management shortfalls seem to contribute to what disappoints me.  (1) NDIIPP has not effectively exploited  the skills of software engineers.  (2) NDIIP has not established productive collaboration with IT enterprises.  There is almost no private sector work on digital preservation.  (3) The NDIIPP digital content scope has not included documents of practical interest such as public infrastructure engineering records, health care records, legal records, and many other record classes that citizens value for their daily lives and futures.

East of England Digital Preservation Project

A recent report describes a pilot project investigating the issues and costs of potential regional digital repositories.  Taking as its starting point the anticipated needs of local authorities, the report looks in detail at the processes and costs involved in preserving and managing digital records of the types routinely dealt with by local authority records managers and archivists, including privately deposited material.

Many of the challenges have to do with ingestion of proffered record collections whose preservation has not been anticipated, whose current formats are problematical, and whose metadata are seriously incomplete.  Communication between archive personnel and collection owners unfamiliar with the technology and jargon of digital collections is another difficulty needing attention.

Not mentioned, but worth thinking about, is the extent to which these challenges are transitory effects of the novelty of digital records—effects that will vanish when our children take over in 20 years or so.[8]

A Misleading Analogy: Paper and Digital Preservation

“[A]s we approach the end of the twentieth century, we find ourselves confronting … a vast void of knowledge filled by myth and speculation.  Information in digital form—the evidence of the world we live in—is more fragile than the fragments of papyrus found buried with the Pharaohs.    [T]o achieve the kind of information density that is common today, we must depend on machines that rapidly reach obsolescence to create information and then make it readable and intelligible.” [9]

“[D]igital objects such as electronic journals are not only mutable but can also be modified or transformed without generating any evidence of change.  It is the mutable nature of digital information objects that represents one of the principal obstacles to the creation of archives for their long-term storage and preservation.” [10]                                                                                           

Pessimism about digital preservation is sometimes accompanied by comparison of the durability of paper to that of digital information.  That printed works are inherently immutable is a professional myth.

The myth is repeated by James Billington, the Librarian of Congress, in a September 2006 Atlantic Monthly article.[11]  This is surprising, since earlier in the year Deanna Marcum, an Assoc. Librarian of Congress, emphasized that “Only a fraction of what the ancient world committed to papyrus has come down to us.”[12]   Even though nobody seriously proposes saving heritage materials forever on today’s digital media, the comparison has been made often enough to warrant cautioning readers that the analogy is misleading.

Paper is mutable—easily burned, easily torn, easily cut, and easily overwritten.  However, four facts about information on paper are reliable guides to a digital preservation solution.  (1) We are usually more interested in inscribed content patterns than in paper artifacts for themselves.  (2) We protect printed information with immense infrastructure that includes widely dispersed libraries with redundant holdings.  (3) It took us many years to learn how to preserve reliably on paper.  And, (4) changes to information on paper can be detected, often easily.

Digital data has an advantage over most other artifacts: bit-string patterns do not decay.  We know how to make any bit-string as useful perpetually as it is today.[13]  Even if better methods were to be invented, if we save original bit-strings together with convenient transformed versions, we could create replacement versions of today’s OAIS AIPs (Archival Information Packages).

What we expect today for saved information is much more demanding than ever before, including at least ease of reading, ease of finding and very rapid access to portions of a vast information corpus, extremely high quality and fidelity that sometimes should include evidence of authenticity, and quality of references/linking.  Why these and other factors make the papyrus-to-digital information misleading is analyzed in a forthcoming publication.[14]

Digital Preservation of a Different Sort

Ray Kurzweil, author of The Singularity is Near, suggests that computers will enable people to live forever.  He predicts that non-biological intelligence will allow humans to overcome illness and aging in just 25 years, and that scientists will develop machines surpassing human intelligence.  He says, "We won't experience 100 years of technological advance in the 21st century; we will witness … about 1,000 times greater than what was accomplished in the 20th century."

Inattention across the Boundaries of Professional Disciplines

The bed-rock of research in this area is to understand in more detail the sociology of preserving and sharing information.  This will include understanding better disciplinary differences, and in particular those requirements that are fundamental versus those that are primarily historical.  For a cultural change to take place, it is important to involve key stakeholders and resource providers and for them to drive this process.”[15]

The digital preservation literature contains repeated calls for cross-disciplinary cooperation.  However, inattention across the professional boundaries is a sad tradition, sadly evident once again.  Each academic community behaves as if what is not represented in its own literature does not exist.

The most amazing twentieth-century development is the unprecedented success of science and technology.  This has been fostered by scientific methodology that includes lively constructive criticism and problem partitioning, with each contributor being confident that aspects he does not address will be handled by others.  Such partitioning is rooted in philosophical analysis starting with Leibniz and Descartes and represented by today’s analytical philosophy.  Getting partitioning right is not easy; false starts are resolved by self-evident utility of successful partitioning.  A great merit is that work, once done, need not be repeated (except sometimes to validate experimental results and applied logic).  I believe that this scientific methodology should be used more extensively by information scientists than seems to be the case.

For my writings on digital libraries and digital preservation, I have inspected over 600 articles written by librarians, archivists, and university information scientists[16]—an informal group sometimes called the cultural heritage community.[17]  This literature has surprisingly few citations to ACM and IEEE articles by software engineers.  This is unfortunate, because the ignored literature contains solutions to technical issues grappled with in the articles alluded to.  Such inattention has permitted, and continues to permit, wastage of public funds.[18]

The literature from the digital heritage community rarely considers the business climate that influences the tools available to it.[19]  The technology that creates today’s excellent access to information for more people than ever before is mostly created by private enterprise, whose rules of engagement emphasize responsiveness to markets.[20]  Unfortunately, industry is unlikely to see cultural heritage repositories as promising customers.  They are simply too few and small, with digital collections smaller than business collections for the foreseeable future.

There is a mismatch—a semantic dissonance—between the language and expectations of cultural heritage community spokespersons and technology vendors.   The current emphasis for technology products seems to be on system components, whereas cultural repositories want customizable “solutions”.

Technology vendors’ work on “solutions” is mostly in the custom contract business, which they call “services” and which is an immense business sector.  Insights and design successes in this area are not published, but rather treated as marketplace advantages that companies nurture, hone, and propagate internally.  This phenomenon contributes to another cultural mismatch: academic librarians seem emotionally and practically unprepared to use outside services.  Their institutions are not financially prepared for outsourcing work, even though they do not seem to have sufficient internal skills to build the middleware components of repository services.[21]

A Two Cultures Model: Stylistic Differences in R&D

My analysis of NDIIPP technical progress brought the Two Cultures rift to my attention more strongly than ever before.  The misunderstandings and intolerance which C.P. Snow described continue to be widespread, and to hamper progress for efficient and effective digital preservation.

The tension is evidenced by differences in writing style between what I have read addressing digital repositories, most of which comes from authors with liberal arts backgrounds, and the physical science and engineering literature with which I have worked from my undergraduate days.  I find the information science literature difficult in that its articles rarely differentiate their novel elements from ideas already published.  In the most influential technical periodicals this difficulty is precluded by expert referees’ demands for clear identification of what is new and for thorough citation of prior literature.

A likely contributing factor is the last 50 years’ increase in the number of university faculty members and “publish or perish” expectations.  The number of new ideas does not seem to have matched the rush of publications.  Critics of scientific literature have pointed to “slice and dice” behavior in which each piece of research is parceled into as many small articles as possible.[22]  I have the impression that information scientists meet the economic imperative by inattention to prior work, repeating what can be found elsewhere.  A consequence has been a large increase in the number of periodicals (and financial pressure on academic libraries).  At least in the sciences and engineering, I believe that most of the new periodicals can be ignored with little risk.[23]

In Wm. Lefurgy’s 2005 NDIIPP presentation, he reminded an ARL audience that there was “still no ‘silver bullet’ solution to digital preservation.”  This repeats earlier authors’ assertions that there would be no single digital preservation solution—a “straw man” assertion.[24]  No good engineer would ever talk about a potential “single solution,” because the phrase has no objective meaning.  (The distinction between simple and compound is entirely subjective, having to do with a speaker’s choice of the level of detail for discussion.)

These impressions from the literature and from personal interactions with members of the cultural heritage community are summarized in the following table of stylistic differences.

Aspect

Cultural Heritage Community

Content Management: Scientific, Engineering, and Medical Communities

Collegial

Values consensus more highly than criticism and debate

Values criticism and debate as methodology for progress

Working relationships

Emphasizes collegial and institutional collaboration and synergy

Emphasizes independent thought and competition

Breadth and depth

Emphasizes global discussions of topic at hand

Emphasizes “in depth” investigation of key topical aspects

Didactic

Combines research reporting with advice for newcomers to the topic

Separates research articles from textbooks and teaching materials

Subjective / Objective divide

Happy to confront subjective matters of opinion squarely

Focuses on objective topics that can be empirically tested[25]

Philosophical basis

Continental philosophy

Analytical philosophy

 

Cassirer’s “expressive perception”[26]

Carnap’s “purely structural descriptions”[27]

Problem attack

Emphasizes relationships among distinct components

Emphasizes partitioning and approximation, with later corrections

Typical reaction to a practical challenge

Recommends organizational or personal behavior; often normative

Builds tools and makes them available for user criticism; iteratively refines these.

Mathematical models

Rarely employs mathematics except for elementary statistics

Uses mathematical models to articulate physical laws and engineering designs

Key standards and conventions