Digital Document Quarterly

Perspectives on Trustworthy Information

Volume 4, Number 3, 3Q2005

 

HMG Consulting

Saratoga, CA 95070

© 2005, H.M. Gladney[1]

ISSN: 1547-8610

 

Editorial

Discussions with DDQ’s prepublication critics and with some authors of work that DDQ touches on motivate the following restatement of the objectives and style that DDQ strives for:

Ø  DDQ topics tend to questions and issues that have not received enough attention, and may therefore be controversial.  When the topic involves a specific project, I try to communicate with its principals before the column appears.

Ø  DDQ is a newsletter that includes unsupported expressions of opinion—unsupported by cited evidence—beyond what is usual in refereed scholarly articles.  Its brevity forces succinct assessments of complex topics for which careful analysis of the “ifs, ands, or buts” would otherwise be desirable. 

Ø  DDQ helps speed readers with many links to cited articles and selected Web pages.  Its citations are limited to articles selected critically from information science literature.  Its descriptions of these works are intended to help readers quickly decide whether or not to look at each cited work.

Ø  With colleagues, I share views about ideal objectives for long-term preservation technology.  Explicit objectives are valuable whether or not they are ever achieved.  They provide evaluation criteria for work in progress, as illustrated by the DSpace commentary below.

Ø  One ideal is a single long-term preservation approach that would work for every data type and for the most demanding risks, protecting even a document that is a tempting target for alteration to perpetrate a fraud.

Ø  DDQ focuses on the eventual end user’s perspective. [2]  It treats archival repository methodology as merely  implementing infrastructure, because end-users tend not to care how any repository achieves what it promises—just whether or not its promises are comprehensible, satisfactory in view of each user’s own interests and risks, and in fact achieved.

Ø  DDQ tends to focus on the most difficult data types (programs) and most tempting targets for disruption and fraud, which we believe to be business, governmental, and medical records. [3]

Every DDQ number reflects in-depth discussions with and careful critiques by John Bennett, Peter Lucas, John Swinden, and my brother Tom.

Preserving Digital Records

The Meaning of ‘Digital Preservation’

DDQ has until now interpreted ‘long-term digital preservation’ as referring only to the challenges caused by rapid obsolescence of technology and by inaccessible (e.g., deceased) witnesses for information authenticity and meaning (correct interpretation).  Specifically, this has meant techniques addressing challenges beyond those already well handled by ‘digital library’, ‘content management’, and ‘database management’.  The distinction has been convenient because content management is relatively well understood and is represented by many successful hardware and software offerings,[4] whereas long-term preservation has, until recently, posed basic computer science questions.[5]

My musings were much helped by three recent D-Lib articles.[6]  As many writers emphasize, preservation will depend on end-to-end infrastructure that manages each worthwhile work from its birth until prevention of its premature death.  Institutional repository managers seem to be focused on collecting their individual institutions’ works, including scientific tabulations and original works that will not survive beyond the immediate interest of their authors unless they are managed better than occurs today.  Emphasizing this aspect might be relatively easy to sell to institutional executive management because it provides evidence for the public value of the institution.  This might also explain why inter-institutional initiatives are not progressing as vigorously as some commentators might wish.

The cited surveys and other articles that we depend on are difficult to interpret because the jargon of the cultural heritage community is not uniform.  For instance, Lynch found it worthwhile to repeat his 2003 definition: "a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.  It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution."

Participants might reasonably advocate delaying effort for ensuring that captured materials survive viably for future generations until institutions are capturing large numbers of original works from their creators into sustainable repositories.  There is little point to developing long-term preservation for content that vanishes early in its life cycle!

A counterargument is that resources currently engaged in creating preservation infrastructure are more than sufficient for dealing concurrently with all technical aspects.  If current projects underway have weaknesses, these surely include the fact of insufficient collaboration—including collaboration across disciplinary boundaries as discussed in the Global Innovation Outlook cited below.

A practical implication is that the phrase “digital preservation” means different things in different contexts—for librarians, “infrastructure and institutional commitments needed to protect digitally represented information”, for technologists, “mitigations for technological obsolescence and fading human recall, compensating for imperfections of the world”, and perhaps other definitions for additional contexts.  The careful reader of any “digital preservation” article needs to discover what meaning its author has for its key terms of reference.[7]  Authors would serve their readers well by providing clear definitions until community consensus for key terms has been achieved.

Digital Preservation in Institutional Repositories

Institutional repositories seem to be focused on the ‘Content Management’ layer on the server side of the software stack depicted in DDQ 2(3).  At least in current institutional repository work, the information transfer unit is ‘a collection’.[8]  In contrast, in our own approach to long-term digital preservation the managed unit is ‘a work’.[9]

The software layer that creates a digital library abstraction by combining file systems, communication services, and database management systems is much the same in all content management offerings.  Offerings differ in their performance, reliability, and scalability characteristics.  This technology is otherwise relatively stable, except for vigorous competition in backup and recovery subsystems that scale to much greater capacities than the cultural heritage community will need for years to come.[10] 

My attention to the several meanings of 'digital preservation' was aroused by a correspondent’s enthusiasm for DSpace as being an open-source offering that provides more digital preservation support than the roughly 80 other open-source offerings[11] and 20 commercial content management packages—enthusiasm that I now regard as not justified by the facts.  DSpace does publicize itself as targeting digital preservation, but currently seems to provide little preservation support beyond what can be found in other content management offerings.[12]

Recall that, in contrast to librarians, archivists emphasize provenance information that reliably places each holding in its historical context.  I therefore searched DSpace documentation for references to provenance[13] and found little more than the facts that DSpace emphasizes the Dublin Core standard as its metadata convention and that Dublin Core includes a provenance field.  I found nothing about constraints on or recommendations for the content of this provenance field.[14] 

Dublin Core falls far short of what OAIS calls for in archival metadata.[15]  The DSpace team does suggest that it is working to help information producers prepare preservation-ready ingest submissions and that other metadata support exists,[16] but DDQ comment on that must wait until such software becomes available together with its user documentation.

Ten years ago, the development accomplished by DSpace (and perhaps by most other institutional repository projects) so far would have been advertised as ‘digital library’.  Five years ago, it might have been advertised as ‘content management’.  Today some projects refer to such work as ‘digital preservation’.  Fashions of the times!  I cannot help but wonder how much today’s fashion is a matter of following the money.  A grant proposal for ‘digital library’ investigation would likely be rejected by referees without their even reading the project description; the same proposal called ‘digital preservation’ would probably be carefully considered.

Requirements for an Institutional Repository

The Stanford University team that created LOCKSS has proposed an approach that could be exploited to create a detailed operational requirements analysis for institutional repositories.[17]  Of specific interest is its suggestion of a threats model as a starting point, and its tabulation of generic threats:

Generic Threat

Comments

Media and Hardware Failures

Some causes of failure are random bit errors and recording track blemishes, breakdown of embedded electronic components, burn-out, and misplaced off-line volumes.

Software Failure

All software components have bugs that might distort returned stored data.

Communication Channel Errors

Both failures to deliver data correctly (IP packet error rate of ~10-7) and undetected errors (at a bit rate of ~10-10).

Network Service Failures

Accessibility to information might be lost from failures in name resolution, misplaced directories, and administrative lapses.

Media & Hardware Obsolescence

Before media and hardware components fail they might become incompatible with other system components, possibly within a decade of being introduced.

Software Obsolescence

Format obsolescence is likely to prevent information decoding and rendering within a decade.

Operator Errors

Operator actions in handling any system component might introduce recoverable and/or irrecov­erable errors to the bit strings being handled at the time of incidents.

Natural Disasters

Floods, fires, and earthquakes.

External Attacks

Deliberate information destruction or corruption by network attacks, terrorism, or war.

Internal Attacks

Misfeasance by employees and other insiders, for fraud, revenge, or amusement.

Economic and Organization Failures

Inability to pay for housing, utilities, communication channels, or system administrators. Organizations might disappear or change their missions so that preserved information suddenly is of no value, or so that destroying preserved information mitigates legal risks.  

Starting with this, detailed requirements could be rapidly written as a tabulation that maps each generic threat into some number of specific threat types, each specific threat type into some number of threat descriptions, and each threat into some number of candidate mitigations, as suggested by Figure 1.  (Such an exercise would be a good senior thesis topic.  An experienced software engineer could complete it and publish the results in about 1 person-month’s work.)

Figure 1; Entity-Attribute-Relation model for threat-mitigation analysis

It might be fruitful to include implications of the just-released U.K. Data Archive self-assessment of its OAIS and METS standards compliance.[18]

Financial Stress for Repositories

Current debates about digital preservation and institutional repositories might, in a decade or two, be considered unimportant episodes in the evolution of scholarly publishing and university structure.  Consider the following excerpts from Heath and Duffy: [19]

... to spec­ulate about "the end of the university, an institution that has existed for a millennium."  That scenario is possible to imagine, ... [because] changes being induced by information technology ...  alter the fundamental relationship between people and knowledge. Thus the technology could profoundly reshape the activi­ties of all institutions, such as the university, whose central function is the creation, preservation, integration, transmission, or application of knowledge.  

...  As Mike Keller has observed, the innovations of technology have combined with fiscal pressures to move many libraries away from collection building.[20]  Em­bracing collaboration, and access to information at the time of need, li­braries moved away from capital investment in the future.  Information, Keller said, became a commodity to be made available to the current university population with too little attention to the library's role as cul­tural custodian.  ...

In the last year, leaders at some of our major research libraries have begun to recognize that their obligations to build lasting cultural reposi­tories were being placed at risk by overly concentrating on informa­tion-as-commodity and the "big deals" of commercial publishing.

Ten Years after the Seminal Report

A decade has elapsed since completion of the task force report that launched widespread attention to digital preservation.[21]  Assessments of progress towards what it recommends seem timely.[22]

The Major Findings of the Task Force were[23]

By analyzing the emerging digital environment …, we have aimed in this report to identify the most demanding preservation issues and to frame them for appropriate action, … describing what is necessary to protect the integrity of the cultural record.  Prompted to "envision possible end-states," we have reached several general conclusions that inform our view of viable options and next steps. In sum, we have concluded that:

Ø  The first line of defense against loss of valuable digital information rests with the creators, providers and owners of digital information.

This opinion is still widely held, with some authors asserting the need to persuade information producers to package their works suitably and submit them to institutional repositories.  In this, ‘suitably’ implies providing preservation metadata identified by OAIS[24] and specified by METS.[25]  What is hoped is unlikely to be realized before tools exist that make it easy and attractive for information producers to package their works in conformance with emerging standards.[26]

Ø  Long-term preservation of digital information on a scale adequate for the demands of future research and scholarship will require a deep infrastructure capable of supporting a distributed system of digital archives.

The educational community has started many institutional repository projects.[27]

Ø  A critical component of the digital archiving infrastructure is the existence of a sufficient number of trusted organizations capable of storing, migrating and providing access to digital collections.  A process of certification for digital archives is needed to create an overall climate of trust about the prospects of preserving digital information.

‘Trusted’ is more difficult to achieve than ‘trustworthy’.[28]  RLG requests comment on its August 2005 Audit Checklist for the Certification of Trusted Digital Repositories.  The current draft of this checklist is more subjective than would be ideal.[29]

Ø  Certified digital archives must have the right and duty to exercise an aggressive rescue function as a fail-safe mechanism for preserving valuable digital information that is in jeopardy of destruction, neglect or abandonment by its current custodian.

Within U.S. educational institutions, the emphasis is on motivating voluntary submission by faculty.

The Pilot Project Recommendations of the Task Force were

1.       Solicit proposals from existing and potential digital archives around the country and provide coordinating services for selected participants in a cooperative project designed to place information objects from the early digital age into trust for use by future generations. … Because the objects in this focal area are at such risk of loss, the project would also provide a useful means of exploring … fail-safe mechanisms for digital archives.

2.       Secure funding and sponsor an open competition for proposals to advance digital archives, particularly with respect to removing legal and economic barriers to preservation.

3.       Foster practical experiments or demonstration projects in the archival application of technologies and services, such as hardware and software emulation algorithms, transaction systems for property rights and authentication mechanisms, which promise to facilitate the preservation of the cultural record in digital form.   …  Moreover, there is growing need for evidence that digital archives can practically and effectively incorporate in their daily operations automated systems for emulating obsolete hardware and software, transacting intellectual property and using cryptographic and other mechanisms for creating trusted distribution channels for digital information.

To some extent NDIIPP-funded projects are addressing these recommendations.[30]  However, there is next to no public mention of fail-safe mechanisms.  Nor has there been much progress on removing legal and economic barriers apart from the controversial Google Print project[31] to digitize books—a project in collaboration with five academic libraries.  Nor do institutional repository initiatives seem to be working on emulation of obsolete technology[32] or cryptography that creates trustworthy digital distribution channels.[33]

4.       Engage actively in national policy efforts to design and develop the national information infrastructure to ensure that longevity of information is an explicit goal.  …  These policy decisions need to be informed with an understanding of the … complexity of digital preservation.

5.       Sponsor the preparation of a white paper on the legal and institutional foundations needed for the development of effective fail-safe mechanisms to support the aggressive rescue of endangered digital information.

These two recommendations still need to be addressed.  Effective repository consortia have not yet emerged.

6.       Organize representatives of professional societies from a variety of disciplines in a series of forums designed to elicit creative thinking about the means of creating and financing digital archives of specific bodies of information.

7.       Institute a dialogue among the appropriate organizations and individuals on the standards,[34] criteria and mechanisms needed to certify repositories of digital information as archives.

These recommendations are being acted on in educational and cultural heritage institutions.  The private sector communities that might be expected to be interested—clinical health care organizations and the pharmaceutical research industry, legal forums and law firms, entertainment and aircraft industries—are conspicuously silent.

8.       Identify an administrative point of contact for coordinating digital preservation initiatives in the United States with similar efforts abroad.

Several organizations in the United States and Europe are providing effective coordination.  Many international conference proceedings have been effectively communicated.

The Best Practices Recommendations of the Task Force were

9.       Commission follow-on case studies of digital archiving to identify current best practices and to benchmark costs in the following areas:

a.      The design of systems that facilitate archiving at the creation stage.

Frequent comments about helping information producers feed preservation-ready works to repository ingest services have not been addressed with effective software.  Regarding digital archive costs, we read, “… reUSE partners are convinced that the administrative stages prior to ingest … and the ingest stage consume significant financial and staff resources.  … ongoing expenditures rise proportional to quality requirements for the digital objects and their metadata.” [35]

b.      Storage of massive quantities of culturally valuable digital information. … What can be learned from experience in [archives of census data, remote sensing satellite imagery, weather data, or commercial data] about the means and costs of ensuring the longevity of digital information?

c.       Requirements and standards for describing and managing digital information. …  A responsible digital archive must provide to its users what it knows about the provenance and context of its objects so that users can make informed decisions about the reliability and quality of the evidence before them.

Extensive relevant standards activities have yet to lead to effective implementations and usage.  The cultural heritage community seems to have made little effort to understand and exploit commercial and governmental content-management know-how that is represented by massive digital collections.

Objective requirements statements at the level of detail needed to guide the managers and technicians responsible for institutional repositories and virtual museums have not been prepared.[36]

d.      Migration paths for digital preservation of culturally valuable digital information

Because cumulative errors are not only possible, but even likely, transformative migration is not sufficiently reliable for long-term digital preservation.[37]

Technology and Social Trends

In 2004, IBM conducted a worldwide study, the Global Innovation Outlook.   Its authors suggest that the 21st century will demand wider collaboration across disciplines and specialties than has been common.  Consistent themes emerged from diverse perspectives and conversations about a wide range of ideas:

Ø  the need for standard ways of exchanging information between members of each ecosystem (and across ecosystems);

Ø  the need for more open collaboration between ecosystem members and, at times, competitors;

Ø  the primacy of the individual as a focal point for innovation.

To avoid burnout, knowledge workers need to choose lifestyles: [38]

“Closely associated with choice is motivation.  For knowledge workers who align interests and passions with work, the work/ life debate becomes moot—hours do not feel worked when you are doing what you enjoy and choose to do.  To the extent that flexibility and motivation drive the knowledge workers of tomorrow, they will resemble today’s entrepreneurs, except that they will make varying decisions as to what components of endeavor make up their lives, and need not allow work to become their life’s defining feature.

“If an individual chooses a job that he or she really enjoys and that represents the values they believe in, the boundary between work and life may become meaningless.” Global Innovation Outlook

More than a decade ago, I began to appreciate my twice-daily 30-minute commute because it provided quiet for more careful thought than was possible in the usual office bustle.  Except for some professions, such as sales, the “always connected” technology of satellite telephony and wireless Internet access is more likely to hamper progress than to further it.  We do own a satellite telephone, but turn it on only for highway emergencies and for organizing rendezvous’s.  I do not expect to change this behavior.

News Reports

Copyright Legislation Changes Proposed in Canada

Are summarized in an article from  Lang Michener, a Canadian law firm.

Growing Complexity of Grid Environments

Grid environments, which are growing due to emergent, industrial-strength software, are also becoming more complex and problematic for users.  Debate has arisen among Grid developers over the future of industry-driven Grid environments.[39]

IEEE To Digitize All Its Technical Journals

IEEE has started to digitize all papers from its technology journals.  Added to the IEEE online collection in June were more than 12,000 papers and articles published in the Proceedings of the IEEE between 1963 and 1987. Papers dating back to the first number, published in 1913 as the Proceedings of the Institute of Radio Engineers (the name of an IEEE predecessor organization), will be online in early 2006.

Reading Recommendations

Official inquiry reports into governmental scandals are usually dreadful reading.  In contrast, Denise Bellamy’s report into City of Toronto influence peddling in a giant IT procurement has received high praise for its literary quality, which is attracting ordinary citizens to become readers.

Peter Galison’s Einstein's Clocks, Poincare's Maps

In the period 1880 to 1910, philosophy, social policy, and physics interacted in ways that influence our daily lives so profoundly that overemphasis would be difficult.  The interactions were even deeper than I had thought until I read the first chapter of this book,[40] which deals with relativistic time measurement[41] and the political debates needed to create today’s standard time zones. 

Galison explains how, in the cases of Einstein and Poincaré, the practical dimension helped shape their understanding of the theoretical dimension, and, in turn, how they helped transform the world.  He summarizes key interactions between philosophers and physicists: Ernst Mach, Henri Poincaré, Moritz Schlick, Auguste Comte, Albert Einstein, David Hume, Richard Dedekind, and John Stuart Mill. [42]

Albert Einstein’s The Meaning of Relativity[43]

In this thin volume, A.E. concisely captures special and general relativity with notation refined from that of the original publications.  This short volume should be comprehensible by anyone comfortable with undergraduate calculus.  Its back-cover description reads:

In 1921... Al­bert Einstein visited Princeton University [to deliver] four lectures describing his then controversial theory of rel­ativity.  ...  As subsequent editions were brought out by the Press, Einstein included new material amplifying the theory.  A revised appen­dix "Relativistic theory of the non-symmetric field," added to the posthumous edition of 1956, was Einstein's last scientific pa­per.

Encyclopedia Britannica, the 1911 Edition

This famous edition is now available on-line via http://www.1911encyclopedia.org/.

Simon Winchester’s The Meaning of Everything[44]

The originators of the Oxford English Dictionary thought the project would take a decade and cost £9000.  It actually took over five decades and cost about £300,000.[45]  Readers might enjoy comparing its production methods, with hundreds of volunteers feeding index sheets that were sorted into a mailbox wall, with how the project might be accomplished today.  The significance of the work is suggested by:

English is not to be regarded in the same way as, say, French or Italian, … It is not a fixed language, the meaning of its words established, approved, and firmly set by some official committee charged with preserving its dignity and integrity.  The French have had their Académie Française … which has done precisely this (and with an … absolute want of humour) since 1634.  The Italians have also had their Accademia della Crusca in Florence since 1582—since long before … there was even a nation called Italy.  The task of both bodies was to preserve linguistic purity, to prevent the languages ruin by permitting inelegant importations, and to guide the public on just how to write and speak.  …  No such body has ever been set up in England, nor in any English-speaking country.[46]  …

For English … changes constantly … .[I]ts words alter their senses and their meanings subtly, slowly, or speedily according to fashion and need.  Dictionaries that record and catalogue the language thus cannot ever be prescriptive;; they must always be entirely descriptive, telling of the language as it is, not as it should be. …  [It needs to be] as full a record as … [teams working] in cramped rooms … [can] determine, of the entire assemblage of … the words used by the learned, the nobly born, the doctor, the dandy, and the divine and, most important of all, the words used by the common man of the street, the slum, the farm, and the field.[47]

Helpful Online Resources

The American Library Association provides a list of banned books: the 100 books most frequently challenged in 1990–2000.

Jan Szczepanski, a librarian at Sweden's Goteborg University, provides a list of scholarly, academic, intellectual, cultural, peer-reviewed, full text or accessible without cost journals.  He also provides a spreadsheet tabulating historic or retrodigitized open access journals.

The Internet Resources Newsletter[48] is a free, monthly, newsletter for academics, students, engineers, scientists and social scientists.  Published by staff members of the Heriot-Watt University Library, it aims to raise awareness of new sources of information on the Internet, particularly those relevant to research in engineering, science, and social science.

Information Technology's dirty words

A few seemingly innocent phrases raise hackles.[49]  Try saying any one of these in polite IT company, and someone will hand you a bar of soap to wash out your mouth:

·         Brittle: unreliable, easily broken, difficult to keep running; a wonderful choice when you want to dis a developer’s skills.

·         Quick and dirty: of a solution meant to solve a specific problem.  Perhaps quick, and surely dirty.

·         Interim: when said of an engineering solution, never as short a period as implied.

·         Legacy: said with a sneer to denigrate all your old, still-functioning hardware and software.

·         Opaque: if a process is opaque, you can’t see into it: It’s a black box.  And although black boxes may look great to end-users, they’re a nightmare to maintain.

·         Churn: change for the sake of change, with wasted motion. Techies use ‘churn’ to criticize management without saying “inept” or “clueless.”

·         Silo: of automation, incorporating the worst aspects of the prior list entries.  A silo is an unconnected, application, hardware, data, or process trapped in its own world.

·         User: a word commonly used by drug dealers and IT personnel to describe their customers.  Belief is that, if the 'users' had any brains, they would be in IT like all of the really intelligent people!

Home Computing Technology

ZoneAlarm’s Security Suite

PC Magazine (6-Sept-05) calls ZoneAlarm Security Suite 6.0 superb!  You might want to look into it.

OpenOffice 2.0

Anyone interested in escaping Microsoft Office lock-in is likely to find the OpenOffice review in PC Magazine (6-Sept-05) helpful.  Bruce Byfield’s OpenOffice.org Writer vs. Microsoft Word is another helpful review.  EWeek has reviewed StarOffice 8, the fraternal twin of OpenOffice.

Google Earth

Google Earth uses U.S. public domain, chartered flight and Keyhole satellite images to allow customized use of Google Maps.  One option allows overlaying aerial images with town and street names.  See a descriptive Wiki.  Friends and I are impressed.  And the price is right—nil to home users.[50]

Broadband Internet Connectivity almost as Cheap as Dial-Up Service

It took me almost a year to notice that SBC is now providing $20/month DSL service on an annual contract that can be purchased instead of the $50/month charge that I had been paying, and to find out how to effect the administrative change.[51]  

Selected Utilities Recommended by PC Magazine

PC Magazine recommends utility programs annually.  Some that you might want to consider (and that DDQ has not already mentioned) follow.

Google Toolbar for IE provides Internet Explorer users with features similar to some of what Mozilla Firefox has included for about two years.

Dogpile Toolbar

1-click Answers

ASAP Utilities for Microsoft Excel

Price Watch

PC Memory

512Mb PC3200 DDR

$30

$60/Gbyte

Hard disk drive

Ultra DMS HDD 120Gb (parallel)

$27

$.23/Gbyte

Hard disk drive

Maxtor 200 Gb SATA (serial)

$87

$.43/Gbyte

Desktop PC Bundle

Intel® Celeron 2.66Ghz processor, 17” CRT monitor, Canon ip1600 color inkjet printer, 128 Mb DDR333 memory, 40 Gb HDD, CD RW/DVD-rom optical drive, Win/XP Home

$325

each

Desktop PC Bundle (better)

Sony VIAO RB40, Sony SDM-HS75B 17” LCD monitor, Canon ip1600 color inkjet printer, 3.0Ghz processor, 512 Mb DDR memory, 200Gb serial-ATA HDD, double layer DVD+R ROM optical drive, Win/XP Home

$883

each

Desktop PC

Sony VIAO RB40 (same machine as above, without the bundled printer and display), including shipping

$690

each

DVD writer

16x Dual ±R / ±RW and double layer

$44

each

Wireless Router

Linksys 108Mbps Wireless-G cable/DSL wireless router

$55

each

PC Wireless Adapter

Linksys 108Mbps Wireless-G laptop PC or PCI adapter

$44

each

Flat panel LCD display

17” .264 mm pitch

$170

each

Michael’s Minutes (September 2004) suggests caution about advertised deals that seem to be too good to be true.  However, I do believe the goods and best prices published in DDQ (as above) to be reliable.  They are mostly from San Jose Mercury News full-page advertisements by Fry’s Electronics, a Silicon Valley vendor from which I have purchased most of my own equipment.  An acquaintance just purchased the Sony PC identified above from Amazon.



[1]     I am changing my principal e-mail address to .  Please use this for future e-mail.

[2]     See the DDQ 3(2) figure, Digital object paths from producer to consumer.  It is easier to state users’ desires than it is to write repository specifications.  See Gladney, H.M. Principles for Digital Preservation, Comm. ACM, to be published, 2005. 

[3]     Fraudulent change is relatively unlikely for records of research and culture, so few people (if any) in the cultural heritage community are currently paying attention to it.

[4]     Of course, content and database management continue to enjoy rapid technical development.  However, this development is directed at improvements in cost, performance, scalability, security, backup, and convenience, in contrast to providing a service that has not been available at all—as is the case for long-term preservation.

[5]     This is a researcher’s distinction that does not embrace everything to which curators must attend.  See a DDQ 1(2) figure, “Digital document preservation and cultural collection management overlap only incompletely”.

[6]     Lynch, Clifford A. and Joan K. Lippincott, Institutional Repository Deployment in the United States as of Early 2005, D-Lib Magazine 11(9), September 2005.  doi:10.1045/september2005-lynch

      Westrienen, Gerard van, and Clifford A. Lynch, Academic Institutional Repositories: Deployment Status in 13 Nations as of Mid 2005, D-Lib Magazine 11(9), September 2005.  doi:10.1045/september2005-westrienen

      Lavoie, Brian, Lynn Silipigni Connaway, Lorcan Dempsey, Anatomy of Aggregate Collections: The Example of Google Print for Libraries, D-Lib Magazine 11(9), September 2005.  doi:10.1045/september2005-lavoie  

[7]     Substantial risks of misunderstanding occur because of ambiguity of even the most basic terms, e.g., ‘record’.

[8]     This might be because an implementation of OAI-PMH (Open Archives Initiative - Protocol for Metadata Harvesting)  facilitates copying collections.  See, for instance, Witten, Ian H. David Bainbridge, Chi-Yu Huang and Katherine J. Don, and Robert Tansley, StoneD: A Bridge between Greenstone and DSpace, D-Lib Magazine 11(9), September 2005.  doi:10.1045/september2005-witten  

[9]     The objective distinctions between ‘a collection’ and ‘a work’ are not in the main information content of each unit of transfer or management, but rather in access control rules that include constraints on updating provenance information that is essential evidence for each work’s authenticity.  See Gladney, H.M. Bennett, J.L. What Do We Mean by “Authentic”? What’s the Real McCoy? D-Lib Magazine 9(7), July 2003.

[10]    For instance, one can purchase a very reliable and maintainable 16 terabyte array of 400 Gbyte disks packaged in about 20” of standard 19” rack.  Available automatic backup management programs are impressive and improving.  Vendor competition is vigorous.  Such storage, control computers, and software is much more expensive than storage for personal computers.  However, it would be imprudent to use anything less reliable for storing archival collections.

[11]    Borghoff, U.M., Rödig, P., Scheffczyk, J., & Schmitz, L. (2003). Langzeitarchivierung: Methoden zur Erhaltung digitaler Dokumente. Heidelberg: dpunkt.verlag. ISBN 3-89864-245-3.

[12]    DSpace emerges as the most widely used package among respondents to the Lynch/Lippincott survey of U.S. institutional repositories.  (Citation below.)

[13]    Branchofsky, Margret Chudnov, Daniel, DSpace: Durable Digital Documents JCDL’02, 2002 claims “robust provenance logging”.

[14]    In fact, DSpace System Documentation: Application Layer  contains  “The description.provenance field is hidden, as this contains private information about the submitter and workflow reviewers of the item.”

[15]    For that, see for instance McCallum, Sally H. Preservation Metadata Standards for Digital Resources: What we have and What we need, 71st IFLA General Conference, August 2005.

[16]    In contrast, see Lagoze, Carl et al.  Fedora: an Architecture for Complex Objects and their Relationships, to appear in Journal of Digital Libraries Special Issue on Complex Objects, 2005. 

[17]    Rosenthal, David S. H.  Thomas S. Robertson, Tom Lipkis, Vicky Reich, Seth Morabito, Requirements for Digital Preservation Systems: A Bottom-Up Approach, 2005

[18]    Beedham, Hillary,  Julie Missen, Matt Palmer, Raivo Ruusalepp, Assessment of UKDA and TNA Compliance with OAIS and METS Standards, JISC, 2005.  Available from http://www.data-archive.ac.uk/news/publications.asp.  I have not yet had time to read this 111-page assessment, as its availability was announced only shortly before this DDQ number was released.

[19]    Heath, Fred M.  Duffy, Jocelyn.  Collections of Record and Scholarly Communications: The Responsibilities of the Research Library in a Rapidly Evolving Digital World, J. Library Administration 42(2), 5-21, 2005.   ISSN 0193-0826

[20]    Michael A. Keller, Business Models, Not Economic Models for Research Li­braries in the Transition to More Digitized Resources, (presented at the meet­ing of the National Digital Library Federation with the NSF/NASA/ARPA Digital Library Initiative projects, Stanford, December 1996), 112.  “… moving away from collection building” alludes to renting access to digital collections rather than buying copies that would continue to be available after payments ended.b

[21]    Garrett, John. Waters, Donald. Andre, P.Q.C. Besser, H. Elkington, N. Gladney, H.M. Hedstrom, M. Hirtle, P.B. Hunter, K. Kelly, D. Kresh, Lesk, M. Levering, M.B. Lougee, W. Lynch, C. Mandel, C. Mooney, S.B. Okerson, A. Neal, J.G. Rosenblatt, S. Weibel, S. Preserving Digital Information: Report of the Task Force on Archiving of Digital Information, commissioned by The Commission on Preservation and Access and The Research Libraries Group, May 1996.  (The last task force meeting occurred in the autumn of 1995.  Its carefully edited report was published only nine months later.)

[22]    The assessments that follow are subjective opinions summing up what I learned from more than a hundred recent articles, and include conjectures derived from what these articles might have been reported, but did not.  The skeptical reader is encouraged to regard each assessment statement as a proposition whose validity might be checked by his/her own survey and analysis.

[23]    What follows in dark blue font is quoted verbatim from the Summary and Recommendations that begin on page 40 of the Task Force report.  The interspersed paragraphs in black font express personal assessments.

[24]    CCSDS 650.0-R-2, Reference Model for an Open Archival Information System (OAIS), Red Book, Issue 2, July 2001.  An overview of the development of OAIS is at http://ssdoo.gsfc.nasa.gov/nost/isoas/us/overview.html.   See also Sawyer, Donald
et  al., The Open Archival Information System (OAIS) Refe
rence Model and its Usage, 2002, available at http://www.ccsds.org/documents/so2002/spaceops02_p_t5_39.pdf.

[25]    Library of Congress, Metadata Encoding and Transmission Standard.  See http://www.loc.gov/standards/mets/.

[26]    The number of articles advocating metadata practices might exceed the number of archival holdings providing metadata whose quality is close to what these articles call for.  See Bulterman, Dick C.A. Is It Time for a Moratorium on Metadata? IEEE Multimedia 5(12), Dec. 2004.

[27]    See two survey articles by Clifford Lynch and co-workers in the September 2005 number of D-Lib Magazine.  The Internet Archive is notable for its scale.

[28]    See Trust, Trusted, Trustworthy in DDQ 1(2).

[29]    Imagine the pilot’s checklist that would engender our confidence in a commercial flight.  It might include “has a mechanism in place for reviewing, updating, and developing comprehensive policies and procedures …”  However such statements alone would hardly reassure us.  We would want to be confident before takeoff that the pilot has almost surely checked a definitive list of gauges, settings, and hardware conditions against predefined constraints.

[30]    See Library of Congress and National Science Foundation Announce Research Awards of $3 Million to Advance Digital Preservation, press release, May 2005.

[31]    See the BusinessWeek article of June 22, 2005.

[32]    The National Library of the Netherlands (Koninklijke Bibliotheek) is the sole apparent exception.  See The Economist article, A new way to stop digital decay, June 2005.  The technology alluded to is described in Gladney H.M. Lorie and R.A. Trustworthy 100-Year Digital Objects: Durable Encoding for When It's Too Late to Ask, ACM Trans. Info. Sys. 23(3), 299-324, 2005.

[33]    We know how to accomplish this.  See Gladney, H.M. Trustworthy 100-Year Digital Objects: Evidence After Every Witness is Dead, ACM Trans. Info. Sys. 22(3), 406-436, July 2004.

[34]    The IEEE Standards Association (IEEE-SA) has launched a free, monthly e-newsletter, IEEE StandardsWire, providing the most current information about new and revised standards and the initiation of new standards work.  It  includes detailed information about newly available standards categorized by technical interest, and highlights best-selling standards and related products.

[35]    Aschenbrenner, Andreas and Max Kaiser.  2005.  White Paper on Digital Repositories, March 2005

[36]    The only exception known to me is an unpublished requirements statement prepared for the Computer History Museum.

[37]    The most difficult preservation challenge is posed by computer programs, in which a single bit error might send an astronaut not to the moon, but instead to the sun.  Absent a plausible argument that a way has been shown to avoid undiscoverable bit errors, migration is not good enough.  Work cited in endnote 32 does solve the problem by using a practical emulation scheme.

[38]    Global Innovation Outlook, p.60.  An article, The real reasons you’re working so hard … and what you can do about it, (BusinessWeek, October 3, 2005, pp.60-73) elaborates this theme.

[39]    Baker, Mark.  Amy Apon, Clayton Ferner, and Jeff Brown, Emerging Grid Standards, Computer 38(4), 43-50, 2005.

[40]    Galison, Peter, 2003. Einstein’s Clocks, Poincaré’s Maps, Norton, 2003.  ISBN 0-393-32604-7

[41]    The discussion of relativity theory is conducted without mathematics that might have posed difficulty for some readers.

[42]    Ibid, pp.236-241.

[43]    ISBN 0-691-02352-2.

[44]    Winchester, Simon. The Meaning of Everything, Oxford U.P., 2004.  ISBN: 0-19-860702-4

[45]    While the project completion can be identified, its start date is fuzzy.  The figures quoted take as the start the date when Oxford University took over the project from its philological club sponsors.

[46]    Except in South Africa, which has [an] ‘English Academy' … promoting “the effective use of English as a dynamic language”.

[47]    Ibid. p.27-8.

[48]    ISSN: 1361-9381

[49]    Adapted from the InfoWorld August 15th editor’s letter by Steve Fox.

[50]    I presume that Google revenue will come from advertisers who want their locations to appear when a user opts to display hotel, restaurant, or other retail outlet locations.

[51]    This change was not publicized effectively to existing subscribers.  Advertisements promised it only to new subscribers, and in telephone calls to several different sales offices I was repeatedly told that it was