|
Digital Document Quarterly Perspectives on Trustworthy
Information |
Volume 5, Number 4, 4Q2006 |
|
|
|
|
|||
|
|
HMG Consulting |
©
2006, H.M. Gladney ISSN: 1547-8610 |
DDQ 5(4) offers criticisms of apparently
common scholarly practice—criticisms broader and more tenuous than have
appeared in prior DDQ numbers. These
views suggest aspects that readers might reflect on. To the extent that they seem inappropriate, I
would be most interested in refutations, either as private correspondence or in
the form of short arguments that DDQ could publish in later numbers.
Some of the content, particularly that about
inattention across professional boundaries, is stimulated by missing evidence
that I would expect to be present, as in the case of the dog’s barking in
Arthur Conan Doyle’s The Adventure of
Silver Blaze. (What
was significant to Sherlock Holmes was that the dog had not barked.)
It seems to me that C.P. Snow’s Two
Cultures difficulties are still
with us,[1] and have impeded digital preservation
progress. From one side of the divide, I
offer a perspective of differences of approach that have hampered productive
collaboration. My participation in a 1996 panel discussion stimulate
this line of thinking. Readers will see
the adverse influence of The Two Cultures
gap throughout the current DDQ number.
In December, TechWorld reported that the European Union has funded a multi-country
digital preservation project called PLANETS (Preservation
and Long-term Access through NETworked Services), and that a
participants’ team has assembled itself.[2] The DPE (Digital Preservation Europe) website
reports a November PLANET partners’ meeting.
Among the topics discussed in this meeting,
the importance and role of a collection’s “Designated Community” received
attention that puzzles me. This is not
because anything said is surprising or controversial, but rather that it has
long been normal practice for each library to identify such a community as part
of its mission statement. I would have
been interested in explicit distinctions between traditional library practices
and aspects that are new and challenging for digital collections, but none were
emphasized in the meeting report.
“One of the chief tasks of NDIIPP is to
identify and provide for all the barriers to progress in digital preservation.
The most salient are those caused by the rapid changes in technology. Frustrations are shared by industry and
collecting institutions alike over the multiplicity of formats, rapid
technological changes, and hardware and software obsolescence that plague the
new information technologies.” [3]
Recent reports
remind us that the [
“This article draws attention to technical opportunities which, if pursued, would significantly accelerate National Digital Information Infrastructure Preservation Program (NDIIPP) progress towards objectives called for by the U.S. Congress. It also identifies concerns about apparent content scope limitations of the NDIIPP plan.
“A solution is known in principle for every difficult technical problem of digital preservation, including all those identified in NDIIPP publications. They and other works correctly assert that non-technical preservation challenges are greater than technical ones, but do not discuss using technology to reduce non-technical obstacles. Available technical choices show that some apparent preservation challenges are not obstacles after all.
“If document representations and network protocols are standardized, then each archive can autonomously adapt itself to its own institutional environment. Thinking about what end users will want led my colleagues and me to approach the challenge differently than most other authors. We focus on information contributors and readers instead of on the work of repository employees. We design document representations instead of new repository methodology. We treat each repository as a “black box” whose internals can be adapted to local needs instead of discussing sharable repository.”
Compared to the pace of R&D progress
expected in the private sector, at least in the
Three management shortfalls seem to contribute
to what disappoints me. (1) NDIIPP has
not effectively exploited the skills of
software engineers. (2) NDIIP has not
established productive collaboration with IT enterprises. There is almost no private sector work on
digital preservation. (3) The NDIIPP
digital content scope has not included documents of practical interest such as
public infrastructure engineering records, health care records, legal records,
and many other record classes that citizens value for their daily lives and futures.
East of
A recent report describes a pilot project investigating the issues and costs of potential regional digital repositories. Taking as its starting point the anticipated needs of local authorities, the report looks in detail at the processes and costs involved in preserving and managing digital records of the types routinely dealt with by local authority records managers and archivists, including privately deposited material.
Many of the challenges have to do with ingestion of proffered record collections whose preservation has not been anticipated, whose current formats are problematical, and whose metadata are seriously incomplete. Communication between archive personnel and collection owners unfamiliar with the technology and jargon of digital collections is another difficulty needing attention.
Not mentioned, but worth thinking about, is the extent to which these challenges are transitory effects of the novelty of digital records—effects that will vanish when our children take over in 20 years or so.[8]
A Misleading Analogy: Paper and Digital Preservation
“[A]s we approach the end of the twentieth
century, we find ourselves confronting … a vast void of knowledge filled by
myth and speculation. Information in
digital form—the evidence of the world we live in—is more fragile than the
fragments of papyrus found buried with the Pharaohs. … [T]o
achieve the kind of information density that is common today, we must depend on
machines that rapidly reach obsolescence to create information and then make it
readable and intelligible.” [9]
“[D]igital objects such as electronic
journals are not only mutable but can also be modified or transformed without
generating any evidence of change. It is
the mutable nature of digital information objects that represents one of the
principal obstacles to the creation of archives for their long-term storage and
preservation.” [10]
Pessimism about digital preservation is
sometimes accompanied by comparison of the durability of paper to that of digital
information. That printed works are
inherently immutable is a professional myth.
The myth is repeated by James Billington,
the Librarian of Congress, in a September 2006 Atlantic Monthly article.[11] This is surprising, since earlier in the year Deanna
Marcum, an Assoc. Librarian of Congress, emphasized that “Only a fraction of what the ancient world
committed to papyrus has come down to us.”[12] Even
though nobody seriously proposes saving heritage materials forever on today’s digital
media, the comparison has been made often enough to warrant cautioning readers
that the analogy is misleading.
Paper is mutable—easily burned, easily torn,
easily cut, and easily overwritten. However,
four facts about information on paper are reliable guides to a digital
preservation solution. (1) We are
usually more interested in inscribed content patterns than in paper artifacts
for themselves. (2) We protect printed
information with immense infrastructure that includes widely dispersed libraries
with redundant holdings. (3) It took us
many years to learn how to preserve reliably on paper. And, (4) changes to information on paper can
be detected, often easily.
Digital data has an advantage over most
other artifacts: bit-string patterns do not decay. We know how to make any bit-string as useful
perpetually as it is today.[13] Even if better methods were to be invented, if
we save original bit-strings together with convenient transformed versions, we
could create replacement versions of today’s OAIS AIPs (Archival
Information Packages).
What we expect today for saved information
is much more demanding than ever before, including at least ease of reading, ease
of finding and very rapid access to portions of a vast information corpus,
extremely high quality and fidelity that sometimes should include evidence of
authenticity, and quality of references/linking. Why these and other factors make the
papyrus-to-digital information misleading is analyzed in a forthcoming
publication.[14]
Digital Preservation of a Different
Sort
Ray Kurzweil,
author of The Singularity is Near, suggests that computers will enable people to live forever. He predicts that non-biological
intelligence will allow humans to overcome illness and aging in just 25 years,
and that scientists will develop machines surpassing human intelligence. He says, "We won't experience 100 years
of technological advance in the 21st century; we will witness … about 1,000
times greater than what was accomplished in the 20th century."
“The bed-rock of research in this area is to understand in more detail the sociology of preserving and sharing information. This will include understanding better disciplinary differences, and in particular those requirements that are fundamental versus those that are primarily historical. For a cultural change to take place, it is important to involve key stakeholders and resource providers and for them to drive this process.”[15]
The digital preservation
literature contains repeated calls for cross-disciplinary cooperation. However, inattention across the professional boundaries
is a sad tradition, sadly evident once again.
Each academic community behaves as if what is not represented in its own
literature does not exist.
The most amazing twentieth-century
development is the unprecedented success of science and technology. This has been fostered by scientific methodology
that includes lively constructive criticism and problem partitioning, with each
contributor being confident that aspects he does not address will be handled by
others. Such partitioning is rooted in
philosophical analysis starting with Leibniz and Descartes and represented by
today’s analytical philosophy. Getting
partitioning right is not easy; false starts are resolved by self-evident
utility of successful partitioning. A
great merit is that work, once done, need not be repeated (except sometimes to
validate experimental results and applied logic). I believe that this scientific methodology
should be used more extensively by information scientists than seems to be the
case.
For my writings on
digital libraries and digital preservation, I have inspected over 600 articles written by librarians, archivists, and
university information scientists[16]—an informal group sometimes called the
cultural heritage community.[17] This
literature has surprisingly few citations to ACM and IEEE articles by software
engineers. This is unfortunate, because
the ignored literature contains solutions to technical issues grappled with in the
articles alluded to. Such inattention
has permitted, and continues to permit, wastage of public funds.[18]
The literature from
the digital heritage community rarely considers the business climate that
influences the tools available to it.[19] The
technology that creates today’s excellent access to information for more people
than ever before is mostly created by private enterprise, whose rules of
engagement emphasize responsiveness to markets.[20] Unfortunately,
industry is unlikely to see cultural heritage repositories as promising customers. They are simply too few and small, with digital
collections smaller than business collections for the foreseeable future.
There is a
mismatch—a semantic dissonance—between the language and expectations of
cultural heritage community spokespersons and technology vendors. The current
emphasis for technology products seems to be on system components, whereas
cultural repositories want customizable “solutions”.
Technology vendors’ work on “solutions” is mostly in the custom contract
business, which they call “services” and which is an immense business
sector. Insights and design successes in
this area are not published, but rather treated as marketplace advantages that
companies nurture, hone, and propagate internally. This phenomenon contributes to another
cultural mismatch: academic librarians seem emotionally and practically unprepared
to use outside services. Their
institutions are not financially prepared for outsourcing work, even though
they do not seem to have sufficient internal skills to build the middleware
components of repository services.[21]
My analysis of NDIIPP technical progress brought
the Two Cultures rift to my attention
more strongly than ever before. The
misunderstandings and intolerance which C.P. Snow described continue to be
widespread, and to hamper progress for efficient and effective digital
preservation.
The tension is evidenced by differences in
writing style between what I have read addressing digital repositories, most of
which comes from authors with liberal arts backgrounds, and the physical
science and engineering literature with which I have worked from my
undergraduate days. I find the information
science literature difficult in that its articles rarely differentiate their novel
elements from ideas already published. In
the most influential technical periodicals this difficulty is precluded by
expert referees’ demands for clear identification of what is new and for
thorough citation of prior literature.
A likely contributing factor is the last 50
years’ increase in the number of university faculty members and “publish or
perish” expectations. The number of new
ideas does not seem to have matched the rush of publications. Critics of scientific literature have pointed
to “slice and dice” behavior in which each piece of research is parceled into
as many small articles as possible.[22] I have the impression that information
scientists meet the economic imperative by inattention to prior work, repeating
what can be found elsewhere. A
consequence has been a large increase in the number of periodicals (and financial
pressure on academic libraries). At
least in the sciences and engineering, I believe that most of the new
periodicals can be ignored with little risk.[23]
In Wm. Lefurgy’s 2005 NDIIPP presentation, he reminded an ARL audience that there was
“still no ‘silver bullet’ solution to digital preservation.” This repeats earlier authors’ assertions that
there would be no single digital preservation solution—a “straw man” assertion.[24] No good engineer would ever talk about a
potential “single solution,” because the phrase has no objective meaning. (The distinction between simple and compound
is entirely subjective, having to do with a speaker’s choice of the level of detail
for discussion.)
These impressions from the literature and
from personal interactions with members of the cultural heritage community are
summarized in the following table of stylistic differences.
|
Aspect |
Cultural Heritage Community |
Content Management: Scientific, Engineering, and Medical Communities |
|
Collegial |
Values consensus more highly than
criticism and debate |
Values criticism and debate as methodology
for progress |
|
Working relationships |
Emphasizes collegial and institutional
collaboration and synergy |
Emphasizes independent thought and competition |
|
Breadth and depth |
Emphasizes global discussions of topic at
hand |
Emphasizes “in depth” investigation of key
topical aspects |
|
Didactic |
Combines research reporting with advice
for newcomers to the topic |
Separates research articles from textbooks
and teaching materials |
|
Subjective / Objective divide |
Happy to confront subjective matters of
opinion squarely |
Focuses on objective topics that can be
empirically tested[25] |
|
Philosophical basis |
Continental philosophy |
Analytical philosophy |
|
|
Cassirer’s “expressive perception”[26] |
Carnap’s “purely structural descriptions”[27] |
|
Problem attack |
Emphasizes relationships among distinct
components |
Emphasizes partitioning and approximation,
with later corrections |
|
Typical reaction to a practical challenge |
Recommends organizational or personal
behavior; often normative |
Builds tools and makes them available for
user criticism; iteratively refines these. |
|
Mathematical models |
Rarely employs mathematics except for
elementary statistics |
Uses mathematical models to articulate
physical laws and engineering designs |
|
Key standards and conventions |