|
Digital
Document Quarterly Perspectives on Trustworthy Information |
Volume 3, Number 4, 4Q2004 |
|
|
|
|
|||
|
|
HMG
Consulting |
©
2004, H.M. Gladney ISSN: 1547-8610 |
In most DDQ numbers, I have emphasized objective over subjective comments, identified the latter as such, and suggested objective reasons for their occurrence. In the current number, this style has not always been followed to avoid obscuring some opinions by justifications that can be lengthy.
Readers’ opinions would be welcome, particularly contrary opinions accompanied by objective hints of their merits.
I have seldom seen ‘creeping democracy’ used to describe what is happening in the availability, usage, and generation of information. Yet it strikes me as an apt descriptor of transforming opportunities for anyone who chooses to grasp them.
The number of people who read, execute, create, and update information is large and growing. It is larger than ever before both absolutely and as a population fraction. This is only partly because of the amazing decrease in information costs.[1] It is also because citizens are better educated and have more time for discretionary activities than ever before.
Resources
for whose effective use many people have needed the help of specialists are
increasingly accessible to almost anyone.[2] Technology and economics are changing the
roles and methods of most knowledge workers and of all enterprises. End users’ dependencies on professional
mediation will continue to decrease.
Even personal digital libraries will become practical within a decade.[3]
Consider a different digital repository structure[4]—different than the one receiving most published attention—enabled by Internet pervasiveness, by massive affordable storage, and by software for file sharing without central servers or databases, software for a small group’s digital library that is more attuned to its needs than a university library,[5] and software for automatic bit-string replication.[6] A group library package that combined these technologies with rule-guided management routines could be deployed over thousands of small computers to replace the repository services of archival institutions.[7] Each group would specify the access and replication rules for the digital objects that it provides. Little further human administration would be needed.[8]
Progress towards a digital commons seems economically inevitable, because of the growth of the pool of skilled and affluent volunteers,[9] and because of the nature of technology development—particularly of software development. What we mean by ‘the nature of software development’ can be understood with the assistance of a technology layering illustration. At any moment, deployed digital technologies consist of layers from the most basic and most general services to an application-specific layer. What software engineers do in support of knowledge workers can be characterized as: (1) identifying certain human actions as “merely clerical”; (2) choosing a frequently used subset for automation; (3) generalizing this to be broadly useful; and (4) implementing the generalization as a new software layer.
Although the broad areas of progress seem obvious, the specifics are difficult to predict. From a distant perspective the change process appears chaotic. Many attempts are unsuccessful, not so much because of technical flaws as because the implemented offerings do not appeal to large numbers of users. This is partly because better offerings appear at roughly the same time. The whole process is subject to a sort of Darwinian selection.[10] Continuing rapid progress is likely.
Progress towards a digital commons has been accelerating. Perhaps this is because key economic thresholds have been attained—an immense Intenet and millions of WWW users, hard disk drives so inexpensive that it no longer much matters how much storage space is required, a good nucleus of publicly available information digitally represented, and so on. 2004 developments that probably introduce massive imminent changes include:
(1) Emerging competition in inexpensive search services, which are extending to desktop search and services tailored for particular communities (such as Google for scholars). Probably only a fraction of known search techniques have been exploited (the ACM Special Interest Group on Information Retrieval has been active for a quarter century, and its technical literature is huge.)
(2) Massive collections of content not encumbered by intellectual property constraints, particularly the Internet Archive collection and Google’s recently announced book digitization project.
(3) Open Courseware offered by The Masschusetts Institute of Technology.
(4) Advocacy group activity; see the Center for the Digital Future and the Creative Commons.
(5) Inexpensive streaming media services for news and for music.[11]
(6) Beginnings of “grass roots” news services; see WikiNews. Dan Gillmor, technology writer for the San Jose Mercury News, recently published on the theme.[12]
(7) Increasing efforts to make running servers “as easy as turning on a faucet”.
Internet activities are not decreasing people’s enthusiasm
for traditional libraries. Our town
library is probably typical; serving about 30,000 people, it seems to be
occupied by 50-100 patrons at any time, and twice that number after school
hours. The SJ Mercury News published the
following 2002-3 statistics for the
· 5.4 million patron visits—more than the combined attendance at San Francisco Giants’ and Oakland As’ baseball games.
· 13.5 million loans, nearly triple the number of 1994-95.
· 2.1 million holdings purchased since 1994-95.
· 400 computers with Internet access available to the public.
Free
availability of digital content continues to be resisted by the music and film
industries. Little of what they are
fighting to protect seems attractive to me.
I would prefer a different topical area to try such serious issues. [13]
The growing digital document flood exacerbates a readers’ challenge: separating wheat from chaff. We have not seen, but would welcome tools that automatically create a quality measure for each tested document. Such tools would have to work with whatever document forms impinge on prospective readers and be responsive to each reader’s personal rules defining quality estimates. Although semantic judgement is mostly beyond what automatic tools can accomplish, much is possible by analysis of the content and accompanying metadata.[14]
The
We need not speculate further. We believe imminent changes inevitable. Archival institutions would serve their interests best by confronting the possibilities squarely and participating in molding the future.
Much effort has been expended on metadata schema for describing library and archives records.[15] This effort is called into question by a Dick Bulterman’s Is It Time for a Moratorium on Metadata? (IEEE Multimedia 5(12), Dec. 2004.)
Bulterman’s point is that the effort is not matched by use of the schemes defined. Since I have been chastised by ACM referees for inadequately citing descriptive metadata literature, this article was particularly amusing. I believe that even non-specialists will find it both instructive and entertaining.
I hasten to remind readers that metadata might be needed for purposes beyond creating search indices, including information required for digital preservation.
Much digital content
that claims copyright protection is also candidate for long-term
preservation. The content for copyright
protection has been discussed by Nimmer[16]—if a work has represented in tangible form,
copyright protects the abstract pattern represented.
This much is enough in principle, but not in practice if the content or ownership of a copyright is contested in litigation. The content issue is the distinction between the pattern and accidental information that is part of the published instance; we are working towards an article analyzing this issue to suggest copyright owners’ measures.[17] The ownership issue is evidence—an audit trail that has been reliably protected against misrepresentation. This can be handled by metadata exploiting the following notions.[18]
(1) A digital representation models something other than itself. It models a pattern.
(2) Any model has both intentional features—the pattern—and accidental features. The copyright can be asserted to include the accidental features. The distinction between intentional and accidental is a matter of author's intent, which is undiscoverable by others except to the extent that the author articulates it, i.e., creates objective facts corresponding to what was subjective.[19]
(3) The pattern is eligible for copyright protection. It is protected in instances other than that in which it was fixed to establish the copyright claim. Copyright registration is not required.
(4) Also schema are models, and may themselves require further models that explain by reminding readers about how the words are used.[20] In modern jargon, the explications of schema are called "reference models" or "ontologies".
(5) Syntactic intentions can be conveyed with XML. Roughly one hundred XML schema definitions have been agreed on, e.g., MathML for mathematics and XBRL for business reporting. More are being considered for standardization.
(6) Semantic intentions can be conveyed by a knowledge management language. A prominent contender is RDF (Resource Description Framework). Linear RDF syntax looks like XML—you must look closely to distinguish one from the other.
(7) RDF segments can be embedded in XML documents, as can any bitstream.
(8) XML is today's wrapper of choice for creating "complete" bundles.
Presentation slides are available from an October 2004 workshop intended “to move forward archival and records management theory and find innovative ways to further develop fundamental principles of both disciplines.” Ken Thibodeau provided a current view of NARA digital archiving activitites and thinking, and Seamus Ross provided the European cultural counterpart. Slides on similar topics are available from a U.K. forum in the same month.
Information about the DSpace digital repository can be had from the MIT Libraries. It seems to be geared for large educational institutions. In contrast, the reader interested in digital libraries for small groups might find Greenstone from New Zealand’s University of Waikato interesting.
The reader might find it instructive to compare high-level structural depictions of digital library software—a picture originating in our 1993 IBM Digital Library design[21] extended to accommodate LOCKSS-style replication, a recent DSpace picture, and other similar pictures—asking what their similarities and differences teach.
The [
An accompanying Framework document relates a range of standards and best practice guidelines in all aspects of record keeping. TNA intends to implement both specifications in 2005, and solicits comments. The documents are available via The National Archives website.
… putting effective large-scale systems to actually carry out digital preservation activities. This means attention to social, economic, legal, and organizational as well as technical aspects of the digital preservation problem. Over the years, there has been a lot of focus on a magic bullet technology for digital preservation. Personally, I don’t believe one exists. We’ve seen various proposals on magic bullets, e.g., inscribing information on nickel-based storage that can be read in 10,000 years. It doesn’t get us very far when talking in terms of complex interactions and enormous databases. Emulation isn’t a magic bullet either—though I think it’s a useful tool in the toolbox of digital preservation techniques and technologies. [Lynch, RLG DigiNews 8(4), Aug. 2004]
The passage seems to imply that, because no single technique has been offered to provide demonstrably sound and complete digital preservation methodology, no comprehensive solution exists. The latter is simply incorrect.
As suggested by “toolbox of digital … technologies”, normal technical methodology begins by partitioning a challenge into pieces whose solutions can easily be combined. Hopefully, a few pieces will suffice. Apart from work alluded to in DDQ, no complete toolbox has been proposed.
The digital preservation literature is mostly from authors associated with research libraries and archives. As prior numbers of DDQ have discussed, this literature seems to assume that a collection of appropriately-managed institutional repositories will solve the problem. Several years of discussion have not produced a viable suggestion how that can be accomplished.
This institutional theme reappears in the October 2004 DPC Forum speakers’ comments. Anybody who believes the preservation solution is to be found in a method of managing digital repositories might profitably consider the following propositions and, for each that (s)he judges true, what it implies:
(1) Digital repository service is a different challenge than digital preservation.[22]
(2) Few computer scientists have been persuaded to think through digital preservation.
(3) The cultural heritage community has failed to work across disciplinary boundaries.[23] The "not invented here" syndrome is rife.
(4) Research libraries and archival institutions will not be leaders in determining their own digital futures unless they achieve significant prior changes in their internal attitudes, skills, and methods.
(5) The effective cost of deployed digital technology will continue to decrease exponentially, e.g., ~28% annually for persistent storage space.[24] Personnel costs will continue to increase.
(6) Implementing a digital preservation solution (such as TDO methodology) within existing information infrastructure can make non-technical problems (social and organizational) vanish and be seen not to have been problems at all![25] This is feasible without disrupting existing digital repositories. Much effort and money can be saved by eliminating certain current activities.
(7) LOCKSS (from Stanford) and Silverback/Tapestry
(from
(8) The boundary of a collection is a subjective choice. Any information collection will contain references to information that is not part of the collection.
(9) Correct rendering (for human consumption) of a collection member is likely to be dependent upon the correctness of other information objects, some of which might not be in the collection. Even if an object is protected so that its bitstream source is known to be authentic, changes in the objects on which its rendering depends might mislead its human user. For sensitive objects, this poses a security risk.
In early 2004, the NDIIPP managers requested comments on the Version 0.2 updates to the NDIIPP Technical Architecture (NTA hereafter—the Preliminary Architectural Proposal that was Appendix 9 of the 2003 NDIIPP Plan document).[26] Since I thought they would prefer a private discussion over a public one, on 19th May I wrote to Martha Anderson at the Library of Congress, with a copy to Laura Campbell. More than a dozen attempts to talk to one or the other have been ignored—not behavior appropriate for public officials spending your, and my, tax dollars.
Since I believe the criticisms in this letter should be acted on, but are being ignored, the next paragraphs reproduce the core of the May letter.[27]
The
NTA documents are without basic formal qualities that software engineers
expect. They are over a decade out of
date. They attempt design. What little they specify has been incorporated
in commercial software offerings since 1993, and Open Source software offerings
since about 2000. Such offerings provide
everything NTA calls for, and much more essential to repository institutions. Furthermore, the v.0.2 document is a step
backwards; instead of pushing towards standards and conventions needed for
inter-institutional collaboration, it retracts parts of what [the] 2003
Appendix 9 called for. Specifically:
(1)
An
NDIIPP objective is to help “the various stakeholders to be able to collaborate
on long-term digital preservation.” This
can be achieved only through digital objects sharing form for interchange
between institutions and between each institution and individual clients (both
content submitters and library readers).
However the NTA documents give information interchange scant attention,
focusing instead on repository structure that is important to each individual
institution, but that institutions do not need to share.
They
need to address format and protocol conventions that allow sharing without
hampering each institution’s autonomy unduly.
How to do this is known; specific details need to be worked out, but
doing so is a routine software engineering exercise that includes negotiation of
protocol and document representation standards.
(2)
The
NTA documents contain next to nothing addressing the needs of individual
clients—the intended beneficiaries. They
are almost silent about authors’ and readers’ needs.
(3)
NTA is written as if no
digital library technology existed. It
ignores the extensive literature on digital preservation. It ignores progress on information
interchange in the commercial world—progress that includes much that the
cultural collection community will surely use to accomplish NDIIPP
objectives. It even ignores standards
development to which the Library is a major contributor (e.g., METS and MODS).
It is
ironic that, representing the thinking of a professional community that works
to preserve reference material for scholarship, NDIIPP publications seemingly
make use of no prior work, as if no worthwhile prior work existed.
(4)
NTA
fails to distinguish between digital repository and digital
preservation. The former topic is
well developed, with software offerings that have been refined for about a
decade. NTA should presume such
offerings adequate for NDIIPP except for shortfalls that the plan specifically
identifies. However, the NTA authors
seem not to have looked at existing repository software.
(5)
Avoiding
the consequences of technological obsolescence and imperfect human (community)
memory—digital preservation—can and should be treated as a focal
topic. However, NTA is almost silent
about the preservation challenge!
Proposals exist, but are ignored by NTA.
Also ignored is a year-old EU/NSF study of digital preservation research
needed, even though some of its authors are among the NDIIPP advisors …. How
can an NDIIPP proposal ignore digital preservation?
(6)
The
NTA documents suggest that no statement of (technical) requirements has been
written by the NDIIPP team, even though more than three years have elapsed
since NDIIPP was funded! Parts of the
v.0.2 update begin to read as such a statement of requirements, but they
provide only a tiny fraction of what is needed.
(7)
Software
layering called for in the NTA documents is simplistic. More elaborate layering is essential. It is also provided in all content management
software offerings that I know.
Layering is alluded to (without naming
it) in section 3 (“Core Characteristics”) of the v.0.2 update in sentences that
include “hopelessly bloated”. How to
avoid bloating is known in routine software engineering practice. In early 2001, Deanna Marcum asked me to write
an analysis of commercial know-how pertinent to NDIIPP. Delivered in August 2001, this report
addressed layering and other NDIIPP technical needs better, in my opinion, than
the TSA documents. I am disappointed
that my work has been ignored with all the rest.
The
foregoing list of NTA document weaknesses is incomplete, but should serve to
convey why colleagues and I are deeply disappointed by the NDIIPP technical
component. We cannot help but feel that the expertise of the software engineering
community has not been effectively exploited. We do not know why this is the case, but feel
that such omissions must be corrected if the NDIIPP is to be effective and to
use public funds efficiently. [H.M. Gladney to NDIIPP managers, 19th
May 2004]
As far as I know, the NDIIPP managers have neither refuted these
criticisms nor acted to fix the alleged problems. Public comment by DDQ readers might help persuade
desirable improvements.
A recent New York Times front page article, Even Digital Memories Can Fade (10 November 2004), erroneously asserted that “[t]he problem of preserving digital photos and other electronic records for future decades confounds even the experts.” The false assertion, “no one has figured out how to preserve these electronic materials for the next decade, much less for the ages” is echoed by over 400 Web search “hits” (as of 28th December), even though we have published a technical solution. The columnist, Ms. Katie Hafner, cited only three Washington Beltway insiders and three apparent amateurs in the topic. My letters, first to Hafner and then to the NYT editors, have been ignored, perhaps because discussion of a problem attracts readers more than that of a solution!
In addition to Raymond Lorie’s publications that began in 2001, a paper in ACM Trans. Info. Sys. in July, and half a dozen readily available preprints, three more papers have been submitted to periodicals with high refereeing standards.[28] Trustworthy 100-Year Digital Objects: Durable Encoding for When It's Too Late to Ask (joint with Raymond Lorie) has passed a first round of criticism by ACM Transactions on Information Systems referees; a slightly amended version was sent to the editor in October. The Koninklijke Bibliotheek (The Netherlands) has deployed a pilot of the virtual machine technology this article communicates.[29]
Preserving
Digital Records: A Method Guided by Scientific Philosophy was submitted to Archivaria in November,
after its editor rejected a prior submission because it was more technical than
Archivaria readers were accustomed to.
The new version is written for professional archivists and research
librarians, and pays special attention to issues raised in prior Archivaria issues.
Principles for Digital Preservation, submitted to the Communications of the ACM (the periodical that reaches all ACM members) in November, has survived a first round of critical reviews.[30] This overview of TDO methodology had to conform to strict Comm. ACM limits: not more than 3000 words with 12 citations—not easy for a topic with many important details.[31]
DDQ has invited critiques of this work for about 18 months. No substantial technical problems have been identified to us.
The next step is to build a prototype of client workstation tools that ordinary users[32] will find convenient for packaging TDOs sealed with authenticity certificates, for extracting TDO payloads, and for inspecting evidence contained in TDO certificates and in the digital objects cited by the TDO.
There is little doubt that everything needed is technically feasible. Prototyping will have two objectives: (1) demonstrating that the technical complexities can be hidden so that ordinary users find the package convenient, and (2) as a step towards pilot installations. Since creating such a prototype will be more than a single person can do in a reasonable time, we have applied for funding for part of the work.
Retrospection of the Trustworthy Digital Objects papers suggests that their scope and constraints are nowhere stated as clearly and concisely as readers might want. The limitations include:
(1) TDO methodology addresses only the technical portions of digital preservation requirements.
(2) TDO methodology focuses on the most difficult anticipated cases[33] for which preservation might be wanted—file types for which perfect rendering is most difficult (probably computer programs) and records for which chicanery (record or provenance falsification) is tempting and can create immense risks for legitimate users. For relatively simple file types and for records not associated with large risks, other mechanisms than those we describe might be more economical.[34]
(1) TDO methodology begins with the observation that good repository software offerings have existed for some years. Some are almost adequate for their obvious role as part of any digital preservation solution, needing at most small extensions for long-term content.[35]
(2) In keeping with (3), TDO methodology deals only with methods for ensuring that bit-strings survive forever, that files remain useful forever, that eventual readers can test document authenticity, that ordinary people can create and can use durable records, and that the preservation toolkit can be installed and used without disrupting other software that people have chosen.
We emphasize that the TDO core is a representation and packaging methodology for digital objects, in contrast to most attempts towards a preservation solution.[36] We believe enabling long-lived repositories to be only a secondary objective—a means towards the proper objective, preserving digital objects.
The July - December 2004 issue of the DPC/PADI’s What's New in Digital Preservation bulletin reports recent work. It should be consulted by anyone working in the area.
BusinessWeek reports collaboration to catch phishers.[37] Big enterprises from Citibank to AOL will share data about ID-robbing cyberscams and boost government efforts to catch the malefactors.
Paul Horn, IBM Director of Research, has collaborated in a Physics Today article discussing physics research in support of information technology.[38] It suggests reasons why I am confident that we will enjoy at least another decade of amazing price/performance improvements in digital devices.
An
Any serious student of 20th-century philosophy should look at Alberto Coffa’s The Semantic Tradition from Kant to Carnap (Cambridge University Press, 1991). For those who have already read Wittgenstein’s work and logical empiricist tracts, it provides insight into the problems these authors confronted.
In contrast, Brian Magee’s Talking Philosophy: Dialogues with Fifteen Leading Philosophers (Oxford UP and BBC, 1978) has broad appeal—particularly to readers not familiar with the topic, but wanting an educated layman’s understanding of the issues and why these are universally important. This book originates in 1976 BBC television dialogues between Bryan Magee and fifteen outstanding thinkers. It makes even the most difficult ideas accessible to the general reader.
The DDQ 1(1) extrapolation of desktop disk prices from figures published by the New York Times suggests that $200 would buy 300-400 gigabytes in 2004. Recent offers such as a 120GB Western Digital DMA/100 HDD for $70 validate the 2 ½ year-old estimate. We expect its semi-logarithmic graph will project well for at least two more years, and that in 2007 you will be able to buy a persistent terabyte for $200!
Are you constantly short of screen space? Would you often like to invoke an application using an icon that is tucked behind an open window? Do you often hunt for Windows utilities? If so, you might like WINUSCON (Windows User Console), which collects application links into a tabbed pane that runs as an ordinary application. Download free from http://www.matirsoft.com.
I am trying out new software, TrustyFiles from RazorPop™, which promises a good single interface to multiple peer-to-peer file-sharing networks.
If you consult a supplier’s service technician, he is likely
to ask about the PC configuration within which your problem occurred. Free tools provide complementary detailed
information about the hardware and software of your PC. I’ve used BelArc Advisor for
about two years; its output is elegant.
The newer WinAudit
includes specifics about the applications running when it was
executed. You can view its audit report on screen as well as save it in text, web
page, XML and spreadsheet formats.
I strongly
recommend storing the outputs of these applications every few months, or when
you know your PC configuration to have changed significantly, keeping the prior
reports. This will cost only a few
moments, and might save you significant time and aggravation if a reliability
problem occurs.
Some corporations replace employees’ machines roughly every 4 years, because doing so is less expensive than maintaining old hardware and software. I recommend that home users do the same.
However, I just now recommend delaying any pending replacement until Intel’s new chipsets (e.g., Intel® 915G for mainstream users) become available in PCs at non-premium prices (perhaps in mid-2005). Any performance problems that you are experiencing (such as delays in opening text files for edit) probably originate more in input/output channels than in main processor speeds. The new chipsets address this, with features such as an 800 MHz front side bus (FSB), serial-ATA disk ports, and DDR2 533 MHz memory support. They also save PC space (and probably cost) by integrating a fast graphics processor, a 2 GHz external I/O bus, and 7.1 surround sound with Dolby® support onto the motherboard.
Advanced Micro Devices does not (yet) offer a direct counterpart, but will soon support the new PCI Express, serial ATA, and DDR2 standards.
Critique by and discussions with John Bennett, Tom and Sebastian Gladney, and John Swinden have helped create this DDQ number. Their help is gratefully acknowledged.
[1] Here, ‘cost’ includes the human time costs of learning how to use digital machinery and of exploiting it. In fact, these human time costs are today the primary economic constraint on uptake of digital information services.
[2] For instance, stenographers have almost vanished as a profession, not primarily to save wages, but rather because authors now have the tools to write better and more quickly without clerical assistance.
[3] I’ve long hoped to have time to explore Greenstone (note 5), but have not yet managed that. I’ve also just ordered Nick Tomaiuolo's, The Web Library: Building a World Class Personal Library with Free Web Resources, Cyberage Books, 2004.
[4] This scenario is intended neither as a prediction nor as an action recommendation, but merely to suggest that imminent structural changes might be radical. It is technically attractive, but not yet thought out sufficiently for confidence that it encounters no difficult business or technical obstacle.
[5]
[6] Reich, Vicky.
Rosenthal, David S.H. LOCKSS:
A Permanent Web
Publishing and Access System, D-Lib Magazine 7(6), 2001.
[7] What would be the effort to create such software? Probably 10 to 20 man-years work to create an “industrial-strength” version. Why has it not yet been done? Perhaps there has (not yet) been sufficient economic motivation.
[8] See a
structural depiction. The reader
will recognize functional relationships with peer-to-peer services pioneered by
peer-to-peer networks such as Kazaa
and Gnutella. Replication management software can be set up
to be extremely robust, providing access to nearly all network content even
when a large fraction of the Internet is
inaccessible to the client at hand. It
can also provide each digital librarian with full autonomy to control the
objects (s)he makes available.
[9] Here, ‘affluent’ means simply ‘having sufficient income from other sources to make it affordable to devote time, skill, and energy to public service.’
[10] Selection is made more rapid and decisive by ‘network effects’, in which each user benefits from other users’ existence.
[11] Classical music buffs will like WCPE, a North Carolina station, whose program selections are more varied than those of Beethoven.com. On the other side of the coin, I regret to report that my prior favorite, WQXR of the New York Times, seems to have developed constraints consequential on a commercial connection (AOL).
[12] Gillmor, Dan. We the Media, O’Reilly, 2004.
[13] Some readers might protest that one must fight the issue wherever it crops up, because we are on a “slippery slope” toward abrogation of freedom. I do not find that assertion persuasive without identification of other large business or political sectors that threaten imminent action similar to that of the entertainment industry.
[14] Spam e-mail filters suggest what might be possible.
[15] See, for instance, Lazinger, Susan S. Digital Preservation and Metadata: History, Theory, Practice, Libraries Unlimited, 2001. Also OCLC/RLG PREMIS WG. Implementing Preservation Repositories for Digital Materials: Current Practice and Emerging Trends in the Cultural Heritage Community, OCLC, 2004.
[16]
David Nimmer,
[17] Joint work with J.L. Bennett and P. Lucas. Trustworthy 100-Year Digital Objects: What's Meant? Intentional and Accidental in Documents will be available in preprint form early in 2005.
[18] This analysis is informed by philosophical readings alluded to in prior DDQ numbers.
[19] This can be accomplished by identifying schema that express pattern-preserving transformations. Such transformations might include ones to render the model for human comprehension.
[20] See early parts of Wittgenstein’s Philosophical Investigations.
[21] Gladney, H.M. A Storage Subsystem for Image and Records Management, IBM Systems Journal 32(3), 512-540, (1993).
[22] "Digital repository service" includes all aspects of creating, maintaining, and serving from a collection. This might include digital preservation measures, but also might not extend to deliberate action to achieve long-term preservation.
"Digital preservation" is about activities, design, and tools specific to long-term information safety. While this requires some form of repository, it need not involve large repositories. An argument sketched above suggests that a network of automatically-managed small repositories (e.g., in thousands of personal computers) would be more reliable and convenient.
[23] There is a hampering difference of social style: scientists and engineers value criticism as a tool; research librarians seem to hate it, valuing (the appearance of) consensus over almost every other expression of opinion about their business.
[24] The compounded effect is enormous, e.g., the price in five years will be about 20% of today’s price. Articles on preservation costs mostly fail to estimate the likely consequences of this and other big technology price changes.
[25] That clarification is especially effective when it shows confusions “not to have been problems at all” is taught by Wittgenstein.
[26] The topic was previously addressed in DDQ 2(3) "What’s Missing".
[27] Highlighting has been added.
[28]
Preprints
are available, either via the hot links below, or via http://home.pacbell.net/hgladney/hmgpubs.htm.
[29] Wijngaarden, Hilde van. Erik Oltmans. Digital preservation in practice: the e-Depot at the Koninklijke Bibliotheek, Vine 34(1), 21-26, 2004. See also Digital Preservation and Permanent Access: The UVC for Images, Koninklijke Bibliotheek, 2004.
[30] I was amazed and pleased to receive the reviews 18 days after the submission. Referees’ critiques for other ACM periodicals, in my experience, are received only 3 to 6 months after submission.
[31] A common saw among software engineers and perhaps within other professional communities is, “The devil is in the details!”
[32] By ‘ordinary user’ we mean a service client who is not a computing specialist and who has not been specially trained to use the applications in question. (Unfortunately English does not yet have a term for this relationship to computer applications.)
[33] Generalizing a problem and focusing on its most difficult instances is a common scientific technique. If a solution is found, it has the widest possible applicability. Frequently, it is also a more elegant and simpler solution than the results of piecemeal attacks on special cases.
[34] When TDO implementations are refined they are likely to be as easy and inexpensive to use as conceptually simpler methods. If this is achieved, the overall cost to users will be minimized by use of a single method for all kinds of information object.
[35] A preservation solution that requires unique content management software will not achieve wide-spread uptake. To avoid hampering information sharing, the preservation mechanism must work in all repositories. However, few content managers work well in both tiny and giant repositories. (The first successful content manager, IBM Digital Library, did accomplish this.)
[36] The latter bias is illustrated in the latest newsletter from the (U.K.-based) Digital Preservation Coalition. New British digital preservation funding is focused on “institutional management support, institutional repository infrastructure development, and digital preservation assessment tools.”
[37] phisher: a person who sends out legitimate-looking e-mails appearing to come from a legitimate source in an effort to get personal and financial information from the recipient.
[38] Theis, Thomas. Horn, Paul. Basic Research in the Information Technology Industry, Physics Today 56(7), 44-49, July 2003.