Digital Document Quarterly

Perspectives on Trustworthy Information

Volume 7, Number 2, 2Q2008

 

 

 

Past DDQ numbers

HMG Consulting

Saratoga, CA 95070

©  2008, H.M. Gladney

 

ISSN: 1547-8610

Open Access: OCLC and Google to Share Book Information

OCLC and Google have agreed to exchange book discovery data.  Google will link from Google Book Search to WorldCat, which will drive traffic to online library services.  Google will also share digitized book data.  WorldCat will represent OCLC member library collections and link books scanned by Google.  A user who finds a book in Google Book Search will be able to use WorldCat to find local library copies.

Archiving and Long-Term Digital Preservation (LDP)

Recent correspondence about archiving reminds me how difficult it is to communicate precisely.  Writing is more difficult than conversation because no listener can signal confusion that a speaker might promptly correct.  This challenge has been particularly evident in writing DDQ 7(2).  Even though what follows has been repeatedly edited with advisors’ help, I am not as confident as I would like to be that readers will infer what I intend.  The difficulty is even greater for documents in long-term storage (Figure 1).

Figure 1: Simplified version of a model used in Preserving Digital Information[1]

One can reduce the communication difficulty by providing careful definitions and contextual information.  However this remedy creates its own hazards—lengthy explanations that try readers’ patience, blizzards of detail that obscure central points, and seeming pedantry.

Such difficulties hamper community attempts to design information sharing tools, a current emphasis in digital library literature.  Different authors use even well-known terms, such as “archiving”, differently.  Partly for this reason, I understand only imperfectly what the Blue Ribbon Task Force on Sustainable Digital Preservation and Access (abbreviated “BRTF” below) includes within its “sustainability” scope, or precisely which questions this group intends to answer.

As background for what follows, some DDQ terms of reference need to be explained.  By “archiving”, DDQ means digital content management needed to ensure ready access to reliable records both immediately and in the distant future.  It is useful to partition this into topics which, in good information system design, are only lightly coupled:

(a)     Management prior to repository ingestion.  This portion of digital object management is more evident for bureaucratic records than it is for cultural and scholarly works.[2]  Bureaucratic records are typically generated, formatted, and managed to conform to pre-existing rules.  Controls are less formal for other data.  For concise reference, DDQ will allude to this portion as DocPrep;

(b)     Core digital library services, being the functionality defined by a two-year old interface standard,[3] JSR 170.  DDQ will allude to this portion as DocSS (as suggested by Figure 2 in DDQ 5(2));

(c)     Near-term repository management, including all aspects of ingestion, curation, cataloging, access provision and business controls, and storage management—everything needed for content user services now and for roughly ten years.  Typically this implements higher level digital repository services that rely on one or more DocSS instances.  DDQ will allude to this portion as DocArch (see the second largest box in Figure 2 in DDQ 5(2), where it is labeled “Archival Store”);

(d)     Long-term digital preservation, which is taken to be all measures required and/or undertaken to mitigate digital object unreliability caused by ravages of time, including human misfeasance, fading human memory, and technological obsolescence.  DDQ has already called this LDP.

A common feature of these partitions is that each focuses on tools that handle target content directly.  An emerging software category addresses

(e)     Assisting human managers of repository institutions for planning their work, managing selection into collections, and signaling execution deadlines.  For examples, see EU Planets tools below.

DocPrep is important for bureaucratic records management, as in the U.S. government, but might be of little interest to scholarly and cultural repositories.  DocSS is mature, with many COTS and open source offerings, so that new R&D projects for this component would be sensible only for specific enhancements, such as performance, scaling, or reliability improvements.  DocArch implementations are likely to differ for different kinds of institutions; for instance, small colleges might have different needs than the University of California.  They might also require extensive parametric customization for institutional preferences and for coupling to document manipulation tools; however DocArch ideas seem mature.

A Note on Partitioning into Software Modules

Among reasons for treating LDP as a distinct partition is the fact that it can be developed and connected to the other components without much changing their implementations or disrupting installations that use them.  The components (a) through (e) are high level partitions of archiving services.  Each of these should be composed of several smaller lightly coupled components.  Such partitioning into lightly coupled components is particularly helpful when the components have different maturities and different portability among installations.

What do we mean when we say that software modules are “lightly coupled”?  We mean that a programmer responsible for one module can change its implementation without impacting coupled modules and without consulting with the programmers responsible for these other modules.  The key is more or less formal agreements on the syntax and semantics of the interfaces that each module makes available to or uses from coupled modules.  So-called APIs (application programming interfaces) are an agreement form that is useful between acquainted programmers.  Interface standards, such as JSR 170 (Content Repository for JavaTM technology API), are more formal interface specifications.  An interchange convention for sharing data objects via communications links, such as the protocol for Object Reuse and Exchange (ORE), is still another form.

Content Partitioning in Preservation Discussions

Figure 2: Workflow for bureaucratic documents

 

Digital archiving literature seems to be partitioned—articles about bureaucratic record handling (Figure 2), articles about managing cultural and scholarly articles (Figure 3), a beginning of articles about personal information,[4] and perhaps further partitions—with articles about one partition seldom citing those in the others.  For instance, there seems to be little practical connection between work on ERA at the National Archive and Records Administration (NARA) and that on NDIIPP at the Library of Congress.  To some extent, this is justified because formal rules and human roles are significantly different in the different partitions.  An unfortunate side effect is little attention to synergism that could reduce the cost of tools and enhance information sharing between partitions.

image description

Figure 3: Workflow for cultural documents

Portico and Ithaka’s survey of about 1000 U.S. library directors identifies another partition, electronic periodicals.  At the same time, A Comparative Study of e-Journal Archiving Solutions has appeared.  It makes evident striking differences of electronic periodicals from other documents.  Their treatment is dominated by intellectual property law considerations.  The authenticity of saved periodicals is unlikely to be a big issue because the material is not a tempting target for felonious modification and because any interesting periodical is likely to be saved by many autonomous libraries.  The topics discussed in the study suggest that today’s urgent issues for e-periodicals have more to do with near-term archiving (less than 50 years) than with long-term archiving (more than 100 years).

LDP literature is more difficult than it might otherwise be because different communities display different notions of worthwhile research.  If a computer scientist can describe how to satisfy a service requirement, he would say it is not a proper research topic.  In contrast, the U.S. NDIIPP plan reflects a common view that a research topic exists for any information management need unsupported by available software.[5]  In IBM Research corridors in the 1980s, the boundary between research and practical engineering was called “SMOP”—“a simple (or small) matter of programming.”  This did not necessarily mean that the task being discussed was either uncomplicated or inexpensive.  Instead it meant that computer scientists knew answers to its difficult questions, allowing most of the work to be passed to a software development team.  Patent law wording is apt; one cannot obtain protection for an artifact or process design “obvious to someone versed in the state of the art”.

How to Speed Up LDP Progress

[T]here has been relatively little discussion of how we can ensure that digital preservation activities survive beyond the current availability of soft-money funding; or the transition from a project's first-generation management to the second; or even how they might be supplied with sufficient resources to get underway at all.                                                                                                                           Lavoie[6]

The Blue Ribbon Task Force on Sustainable Digital Preservation (BRTF) has been described by the Director of the NSF Office of Cyberinfrastructure as “the only group I know of that is chartered to help us understand the economic issues surrounding sustainable repositories … ”.  The BRTF web site declares one objective to be “a research agenda … [for] economic sustainability of digital information”.  As suggested by the Lavoie quotation, this will surely include recommendations on how repository institutions can be funded and also how their running expenses can be minimized.  Will it also be within the BRTF scope to suggest how research and development of LDP tools can be made more efficient and effective than is currently the case?

It seems to me that LDP progress would be accelerated if participants would engage in more sharing of reusable modules than I am aware of.  Certainly, they often refer to “modular architecture”.  By copy of this DDQ number I am asking readers to tell me about any open source LDP code they know of.  I will also write to the larger LDP projects to inquire.  DDQ 7(3) will publish the information I receive.

I believe that digital preservation research funded by taxpayers has been very wasteful, partly as a consequence of poor scholarship.  Authors seem to pay little attention to what is in the literature.  What needs to be said is perhaps controversial, but nevertheless under consideration to be a theme of DDQ 7(3).  The problem is illustrated by a JCDL 2008 paper.

When I first saw the A Data Model and Architecture for Long-term Preservation,[7] I wondered if it described a special case of TDO methodology.1  Since this was not clear to me, and is still not entirely so, I e-mailed its authors that I could not see what novelty their paper conveyed and requested clarification.  After two weeks without an answer to this question, I annotated a copy with notes about apparent problems, prior work, and missed opportunities.  I sent this to the authors, repeating my question.  That netted a response mentioning end of term workload and reminding me of copyright limitations.  The authors have yet to react to the points communicated.

Why don’t I merely ignore this paper?  It’s an example of much wasteful work—wasteful because authors don’t build forward from prior work—even authors from prestigious institutions such as the University of California.  What’s just as disturbing is that JCDL referees fail to detect problems such as those illustrated by the example.  Because the problem seems to be widespread, DDQ 7(3) will analyze what I see, expanding on this and other examples.  To illustrate it, I am making my critique of the example available to anyone who requests it by e-mail.

On a positive note, JCDL 2008, in which the criticized paper was presented, contains several papers whose ideas might prove helpful for semi-automatic creation of metadata called for in the TDO architecture.[8]  Also note that the Bibliographical Center for Research is inviting prompt critical comment on its CDP Imaging Best Practices draft document.  (The announced 13th June deadline is “soft”.)

NARA’s Electronic Records Archives (ERA)

The National Archives and Records Administration (NARA) summarizes its public commitment by, “ERA will be a comprehensive, systematic, and dynamic means for preserving virtually any kind of electronic record, free from dependence on any specific hardware or software. …  ERA will support the National Archives mission by making it easy for the public and government officials to discover, use, and trust the records of our government”.[9]  Presumably this includes LDP as defined above.

NARA is overwhelmed by digital information, facing huge increases in both electronic records and classified records, according to Congressional testimony by National Security Archive director Tom Blanton.  Blanton summarizes his problem list with,

[T]he National Archives and Records Administration is a tiny agency with … overwhelming challenges.  NARA’s entire operation ($404 million …) is about equal to the cost of a single Marine One helicopter ($400 million) in the planned fleet of 28 … intended to serve the President and senior officials.

He recommends:

Congress should order NARA and the agencies to re-engineer agency relationships so they create archive-ready records, not just records that NARA has to re-process down the line.  The proposed bill H.R. 5811 would make a good start on this challenge, but we need to go further, …

Compare a recommendation in Economics and Engineering for Preserving Digital Content, quoted below.

Blanton emphasizes difficulties caused by classified records, which are peculiar to government data.  Of more interest to most DDQ readers might be NARA’s Electronic Records Archives (ERA) project,[10] whose largest expenditure is a Lockheed Martin (LM) contract for about $300M.

Reports of a May 14th U.S. Senate hearing and some private rumors led me to wonder whether the ERA project was experiencing serious difficulties.  So I drafted some harsh paragraphs for this DDQ number and shared them in a letter to Dr. Weinstein, the Archivist of the United States, inquiring whether they were appropriate.  I promptly received a very responsive letter from the Director of the ERA Program Office, Kenneth Thibodeau.  As well as answering my specific current concerns, it pointed me at NARA ERA documentation that (for unknown reasons) I had not previously found and the 2004 ERA RFP.  It also emphasized that COTS products figured prominently in the upcoming LM delivery.  It further explained the reason for a separate implementation for Presidential files.  Requirements of the Presidential Records Act differ from those of the Federal Records Act.  It will take me some time to absorb the two requirements sets.

I am still uneasy about how well ERA will meet its objectives, but have no evidence for this unease.  The original LM delivery commitment had been September 2007; actual delivery is expected this month (June 2008).  Since such delays are common for big software, this delay does not itself worry me.  We’ll see whether there is reason for DDQ to comment in some future number.

Limitations of OAIS

In view of the many archiving articles that cite OAIS as a sort of “good housekeeping sign of approval”, readers might be interested in a critical look at how OAIS is used.  Alexander Egger has written about shortcomings of the model.[11]

E-spionage Threats and “Trusted Digital Repositories” (TDR)

Enthusiasts for the TDR approach[12] might not believe repeated assertion that it depends on unrealistic assumptions.  One such is that a stored object can be protected for decades or longer against felonious modification. This is called into question by a BusinessWeek probe of attacks on America's most sensitive computing resources.[13]   Even strongly guarded information has exposures.  Another doubtful assumption is that improper modifications can reliably be detected by repository audits.

TDR enthusiasts might argue that they intend to manage only information that nobody will want to attack.  But how can they decide which information is an attractive target and which not?  Do they propose one method of archiving for cultural and scholarly documents and other, yet-to-be proposed methods for sensitive business, government, and private information such as their personal medical records?

I know only two ways to demonstrate information authenticity many decades after it was created.  One exploits public key cryptography.[14]  The other compares copies in autonomous dark archives with publicly accessible copies.  The dark archives must provide extraordinary protection for dark copies’ integrity.[15]  Is it prudent to consider either possibility as fail safe?  I don’t!

Is the TDO method correct and complete as described?  I think so, but don't know.  Repeated invitations to challenge its methodology have induced no plausible criticism.  Is there some better method for validating object authenticity than the TDO method?  None has been proposed.

EU Preservation Program (Planets) Software

In January, the Planets project[16] announced a set of LDP tools to be made available, including:

·       The Planets Preservation Planning Tool (Plato), to help organizations move from requirements assessment to action planning;

·       Two emulators, Dioscuri for simulating a practical computer environment and a Universal Virtual Computer (UVC) [17] for environment independent information representation;

·       A Preservation Characterisation Registry to identify characteristics of digital materials that are candidates for LDP.

·       The XCEL significant property extraction tool working on text, image, sound and some other formats.

·       A testbed, which is a controlled software environment for digital preservation experiments; and

·       A Planets Interoperability Framework for integrating Planets tools and services into a preservation system.  This is extensible to integration of third party tools and services.

iRODS—a New Archival Repository Implementation

The San Diego Supercomputer Center has announced that it is developing a new archival repository implementation that it calls i Rule Oriented Data Systems (iRODS) with its own documentation website.

Cost of Long-term Digital Preservation (LDP)

Digital preservation literature has paid too little attention to content-addressed storage technology (CAS).  CAS platforms are disk–based, object–oriented storage systems designed for the long–term retention of data that is not intended to be changed.

LDP cost considerations should include on-going data center costs associated with power and cooling.  An EPA report on data center  energy usage observes that data storage devices contribute the highest power consumption growth rate and the highest overall power consumption.  Richard Moore’s 13th slide of a San Diego Supercomputer Center presentation[18] at an NDIIPP meeting summarizes this issue.

Questions about Sustainable LDP and Access

Lavoies’ The Fifth Blackbird6 provides hints about the agenda and likely outcomes of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access (BRTF).  When I first read it, my reaction was positive.  This optimism faded as I re-read it and discovered missing ideas.

·       The Fifth Blackbird portrays LDP economics as a funding problem, but suggests no solution.  The difficulties seem tiny compared to concerns at the beginning of the great depression.  Keynes’ 1930 reaction discussed knowledge as capital and emphasized cost-reducing technology.[19]  Couldn’t the BRTF seek technical ideas that make its funding concerns fade to insignificance?

·       Which of the archiving partitions will be the primary foci of the BRTF?  Which work components are viewed as most costly?  What is the relationship between cost issues and funding strategies?

·       The Fifth Blackbird strains to separate economic from technical issues, and consequently pays too little attention to technology’s potential for mitigating challenges.  Engineers, particularly those working in the for-profit sector, vigorously seek cost reduction; continued rapid progress will change content management immensely.  Innovations have often changed people’s roles, sometimes even eliminating professions.  When did you recently talk to a stenographer?  Will the BRTF consider changes that reduce the need for human digital custodians, even changes threatening its constituencies’ careers?[20]

·       The Fifth Blackbird seems to assume that LDP is the same topic as archiving, and that the LDP solution is some variant of Trusted Digital Repositories methodology.  Its examples are mostly about what repository institutions do or might do.  Will the BRTF consider tools for repository clients, the information consumers and information producers who are the true customers for LDP?

·       The Fifth Blackbird pays little attention to social phenomena that make achieving LDP more difficult than need be—islands of automation and silos of conversation.  In fact, it seems to exemplify a form of inattention across professional boundaries—failing to suggest searching the commercial sector[21] for ideas and solution components that further its objectives.  Will the BRTF look outside the tiny community of research librarians, archivists, and information scientists?

An opportunity for cross-discipline collaboration is provided by the SNIA Long-Term Digital Information Retention and Preservation Technical Working Group (LT-DIRP TWG).  It is announced as being “aimed at defining a new logical format standard and best practices for information preservation and migration”.

I continue to be puzzled by many people apparently thinking of LDP as a difficult problem rather than as a readily achievable objective.[22]  Neither Lavoie’s article nor the BRTF website reduces my puzzlement.

British Higher Education Study Keeping Research Data Safe

This study has investigated medium to long term costs for preserving research data.  Its executive summary makes 10 recommendations.  The full report (PDF or MS Word) has chapters about Methodology, Benefits, Cost Framework, Activity Model and Resources Template, Case Studies, Issues for Universities, and Service Models.

This preliminary study recommends a deeper continuation, which I would welcome.  It might provide more detailed cost factorization, separating activities that can be managed to be distinct, and separating recurring costs from one-time costs.  Such study could help potential funders’ judgments of precisely what they are being asked to support and why money is needed.

In the economics section of a not-yet-formally published article,22 I recommended:

Many facts—the number of digital objects, the number of authors, the speed of information creation and dissemination, the expectations of citizens, the cost trends of technology, relative skills of different communities, and so on—suggest shifting as much as possible of the responsibility from repository institutions to those who are served—information producers and information consumers.

This will be feasible only if creating preservation-ready information is an inexpensive addition to editing already required.  Preservation tools must be packaged within information producers’ tools.  Since producers already want their output to be valued, it should be possible to persuade them to do additional work that does not take much time and is easy.  As an incentive, prestigious repositories might limit what they accept for indexing and distribution to preservation-ready content.

In an e-mail, Neil Beagrie remarked that the report’s Dutch archives example confirms a big difference in repository costs if this became publishers’ practice rather than leaving preservation to archival institutions.

News

A giant of American physics, John A. Wheeler, died on April 13, 2008.  (I had the privilege of hearing him when I was a Princeton University graduate student.)  The Physics Today obituary includes:

Wheeler … in 1937 introduced the S-matrix, which became indispensable in particle physics.  He was a pioneer in the theory of nuclear fission, along with Niels Bohr and Enrico Fermi.  In 1939 he collaborated with Bohr on the liquid drop model of nuclear fission. 

Wheeler's graduate students include Richard Feynman, Kip Thorne, and Hugh Everett, some of the most distinguished physicists of the second half of the 20th-century.  Wheeler was renowned for his teaching as well as his research. …  Even after he had achieved fame, he continued to teach freshman physics, saying that the young minds were the most important.

The passing of another great, Jim Gray, was recognized in May 31 reminiscences and technical lectures: Tribute to Honor Jim Gray on the University of California at Berkeley (UCB) campus.[23]  Honors include establishment of a Jim Gray Chair in the UCB Computer Science Department.  The literature on database transaction processing (the topic cited for Gray’s Turing Award) is a model for how science and engineering R&D should be executed.

Illustrative and particularly interesting for current LDP issues is a Gray paper on computing service error sources.[24]  Fully 42% of the failures it analyzed originated in human errors by system administrators!  As far as I know, this kind of risk has not been discussed for LDP by TDRs.  It should be included in challenges to this approach.

Bletchley Park, the home of Britain's secret code-breaking base during World War II, is likely to vanish.  Historians suggest that, without Bletchley Park, the Allies might have lost the war.  Notwithstanding major redevelopment, its director estimates that Bletchley Park's funds will be exhausted in three years.

On May 27, the New York Times celebrated the 50th anniversary of C.P. Snow’s The Two Cultures by writing about a Curriculum Designed to Unite Art and Science.

According to the 2nd June BusinessWeek, “Paperwork may involve less actual paper …, but the number of documents … has continued to rise … consum[ing] more than 9 billion hours a year.”

The 7th April BusinessWeek reports a Harris poll suggesting why the U.S. ranks 16th of 30 countries in high school students’ standardized science tests: their parents don't know much about science.  Three out of four admitted a weak understanding of science, but want their children to do better.  Eight in ten said science is not receiving the attention it deserves.

IBM is investigating tiny water pipes for cooling stacked arrays of microchips.

Progress in Solar Energy Systems

Recent articles suggest optimism for solar energy conversion.  Lonnie Johnson reports a solid-state heat engine with 60% efficiency—double the rate of any other solar process—and an NSF funded prototype.

An IBM method for cooling computer chips uses lens to focus sun’s rays on small photovoltaic (PV) cells.  Its trick is a liquid metal combining gallium and indium to connect the PV cell to a heat sink, avoiding the thermal resistance of other connection methods.

An IEEE assessment of a May 2008 conference of PV specialists summarizes as follows:

·       Silicon-based solar technology is now decoupled from the semiconductor industry and is achieving steady cost reductions, so that those following PV discern a kind of Moore’s law at work.

·       The industry has become confident in that evolutionary path, so that policymakers and planners have started to estimate dates when PV-generated electricity will be competitive.

·       And while the incremental path promises a commercial breakthrough within ten years, a second generation technology may be arriving, and might speed up commercialization of PV technology.

PV production has been increasing 50 percent per year, so that capacity doubles about every 18 months.  Estimates suggest costs below $2/W in seven years, competitive with wind electricity generation.

The conference attendants were most excited by next-generation PV using copper indium diselenide and cadmium telluride.  One company claims a prototype producing electricity for $1/W.

Recommendations for Reading, Listening, and Viewing

Storage devices attract hackers because they concentrate sensitive data that they want.  Encryption can protect stored data, but is impractical without interoperable key manage­ment.  Read about progress towards key-management infrastructure.[25]

If you had no undergraduate economics class, and want to repair the omission, start with R.L. Heilbroner, The Worldly Philosophers: the Lives, Times, and Ideas of the Great Economic Thinkers, 1953 and 1995.

An amusing Newsweek article reports physics analysis of baseball pitchers’ curves.

Charles Darwin's private papers have been published.  See the Complete Work of Charles Darwin Online.

Scientific American has published stories about what words and illusions can tell us about the brain.  Mark Changizi of RPI is interviewed about theories relating mental mechanisms and language evolution. A slide-show of illusions suggests why the brain interprets them as it does.

Sometimes data travels anomalously slowly.  Sometimes a program seems to run, but stalls behind the scenes.  ACM Queue’s Latency and Livelocks suggests the why of such problems.

John Maynard Keynes, Economic Possibilities for our Grandchildren

This upbeat Keynes essay treats knowledge as capital.19  I recommend it for anyone thinking about I/T’s role in addressing 21st-century economic problems.  Keynes concludes:

The modern age opened … with the accumulation of capital which began in the sixteenth century.    [and] a cumulative crescendo after the eighteenth, the great age of science and technical inventions began, which since the beginning of the nineteenth century has been in full flood—coal, steam, electricity, petrol, steel, rubber, cotton, the chemical in­dustries, automatic machinery and the methods of mass production, wireless, printing, … and thousands of other things.   

… assuming no important wars and no important increase in population, the economic problem may be … within sight of solution within a hundred years.    Thus for the first time since his creation man will be faced with his real, his permanent prob­lem—how to use his freedom from pressing economic cares, how to occupy the leisure, which science and compound interest will have won for him, to live wisely …

Keynes’ assumptions proved too optimistic.[26]  World population might not stabilize until forced by universal poverty.  His assumption about war has been aptly addressed:

Once all the Germans were war-like and mean,
But that couldn't happen again.
We showed them a lesson in 1918,
And they've hardly bothered us since then.                                                        Tom Lehrer song, M.L.F.

Daniel Headrick: When Information Came of Age[27]

Headrick discusses information development from 1600 to 1750.  His preface starts:

… on the development of efficient information systems before the great push to mechanize information in the nineteenth century.  Some of the systems discussed—maps, dictionaries, botanical no­menclatures—had their origin in the distant past but were rationalized and improved in the period studied.  Others, such as statistics, graphs, and the telegraph, were truly new.

  One [goal] is to define the idea of information and to show that the "information revolution" is not a re­cent phenomenon, … but has deep historical roots.  [Another] is to trace the origin [and flowering] of some important information systems to the eighteenth and early nineteenth cen­turies.

He provides fascinating, but little-known historical detail, such as:

The technology of engraving and printing was instrumental in the emergence of geological illustration.  Eighteenth-century illustrations had to be engraved on copper plates—a slow, costly process requiring skilled labor—and then printed separately from the text.  For that reason, even expensive books … contained few illustrations beyond the frontispiece.

In the 1830s, publishers of natural history books … switched from copperplate to woodcuts.  Wood engrav­ing was cheap and could be incorporated into a page of text, but it could not show as much detail, and it wore out quickly.  Colored maps and illustrations could be obtained only by hand coloring each copy.

Lithography (etching on stone with acid) was faster and cheaper than copperplate and allowed tonal gradations and fine detail.  It was invented in 1798 Munich … and promoted in England from 1818 on.    [B]ooks … began offering far more illustrations than they had previously..                               Headrick, pp.121-2

Anticipating the Oil Price Explosion

Nobody really thoughtful about the future will have been surprised by the inflation of oil and gasoline prices so prominently reported today.  You might want to read about remarkably precise predictions made fifty years ago.  The 2001 book, Hubbert’s Peak: The Impending World Oil Shortage,[28] begins with

Global oil production will probably reach a peak sometime during this decade.  After the peak, the world's production of crude oil will fall, never to rise again.  The world will not run out of energy, but devel­oping alternative energy sources on a large scale will take at least 10 years.  

In 1956, the geologist M. King Hubbert predicted that U.S. oil production would peak in the early 1970s.[29]  Almost everyone, inside and outside the oil industry, rejected Hubbert's analysis.  The contro­versy raged until 1970, when the U.S. production of crude oil started to fall.  Hubbert was right.

Around 1995, several analysts began applying Hubbert's method to world oil production.  [They] estimate that the peak year for world oil will be between 2004 and 2008.    None of our political leaders seem to be pay­ing attention.  If the predictions are correct, there will be enormous ef­fects on the world economy.

Yergin’s The Prize: The Epic Quest for Oil, Money, and Power[30] provides a more thorough analysis.  See also BusinessWeek 2005, Is There Plenty of Oil?, written when crude oil cost only $57/barrel.

The Film “Wit”

A reviewer has written, “Emma Thompson and Mike Nichols' adaptation of Margaret Edson's intellectual anti-intellectual play "Wit," which won the 1999 Pulitzer Prize, movingly explores a tough but emotionally homeless scholar's confrontation with a life-threatening illness.”  This 2001 film about death by ovarian cancer and chemotherapy is not fun, but full of wit expressed with remarkably precise language.

The film uses a John Donne poem, Death Be Not Proud, as its theme song.

Two Canadian Folk Singers

Almost unknown abroad, Canada’s Nancy White and Marie-Lynn Hammond each will please anyone who likes songs with satirical twists.  Each writes her own songs.  To start, I recommend Nancy White’s Stickers on Fruit and Marie-Lynn Hammond’s Black & White.

Practical Matters

McAfee reports identify dangerous Web sites and estimate the safety of search engines.  Newspapers contain a summary of the former report.

ThoughtMesh implements an unusual model for publishing and discovering scholarly papers online.  It provides tag-based navigation using keywords to connect excerpts of essays published on different Web sites.  See the Scholarship 2.0 blog for a list of features.

A PC World trouble-shooting tip: when­ever Windows or another application hurls an error message, use that exact message as a Google search argument.  This will usually lead to pages detailing what's wrong and how to fix it.

Another PC World tip: using the right utility avoids trouble, but often software tools exacerbate problems.  Be fussy about which utilities you install and use.

Phishing by Simulating U.S. Internal Revenue Service

An e-mail received looked like an identity theft attempt.  It pointed to a Web page requesting information for a tax refund and asked for a debit card number.

The U.S. IRS guidance warns about fraud attempts via the Internet, saying:

E-mails claiming to come from tax-refunds@irs.gov, admin@irs.gov … told recipients [of eligibility] to receive a tax refund … and [directed them] to a Web site.  The site … displayed an interactive page similar to a genuine IRS one … [asking] for personal … information that … IRS does not require.

The Sept. 2007 Consumer Reports title page announces Stop ID thieves: 19 ways to protect yourself online and ratings of computer security software.  It reports, “Your chances of being a cybercrime victim are 1 in 4, our State of the Net survey shows.”  DDQ strongly recommend this number.

Linux Ubuntu

For several years I have experimented with shifting from MS Windows to Linux, partly because I do not like lock-in to the de factor Microsoft monopoly.  Linux Ubuntu 8.04 LTS[31] makes such a shift practical.  Being unpracticed with Linux incantations and also network management, I found Schroder’s Linux Networking Cookbook worth every penny of its $30 cost, even though I used only a few of its recipes.

Every Ubuntu review I have read is positive.  See a Linux Journal article and an eWeek article.  My installation went like a charm, automatically including most of my heavily used applications.  I chose the KDE front end over the default GNOME, partly because it allows several virtual desktops.

The Synaptic installation manager is a Linux high point.  There exist over 2000 application packages, all free.  To obtain any, search and select from a list and click to install.  This has invariably worked flawlessly.  For big packages and slow Internet connections, downloading will take hours.  However the process executes concurrently with other work, without much slowing or otherwise impeding that.  Synaptic can also bring installed applications up-to-date almost automatically.

Another high point is Samba, the file sharing system.  I can "see" and edit files on my Win/NT system almost as if they were Linux files.  (It works the other way round also, but I've not used that.) 

I have yet to install virtualization to enable running Windows applications within Ubuntu.



[1]     H.M. Gladney, Preserving Digital Information, Springer Verlag, 2007.  All the DDQ 7(2) figures are adaptations from this book.

[2]     Gladney, loc. cit. endnote 1, §9.4.

[3]     Java Community Process, JSR 170: Content Repository for JavaTM technology API, 2006.

[4]     Catherine C. Marshall, Rethinking Personal Digital Archiving, Parts 1 and 2, D-Lib Magazine 14(3/4), March/April 2008.

[5]     H.M. Gladney, Digital Preservation in a National Context: ... Views of an NDIIPP Outsider, D-Lib Magazine 13(1/2), Jan. 2007.

[6]     Brian Lavoie, The Fifth Blackbird: Some Thoughts on Economically Sustainable Digital Preservation, D-Lib Magazine 14(3/4), March 2008.

[7]     Gregg Janée, Justin Mathena, and James Frew, A Data Model and Architecture for Long-term Preservation, JCDL 2008.      

[8]     H.M. Gladney, loc.cit. endnote 1, §11.1.2.

[10]    Robert F. Sproull and Jon Eisenberg, Editors, Committee on Digital Archiving and the National Archives and Records Administration, National Research Council, Building an Electronic Records Archive at the National Archives and Records Administration, 2005, ISBN:0-309-09696-0

[13]    Brian Grow, Keith Epstein, and Chi-Chu Tschang, The New E-spionage Threat, BusinessWeek, 33-41, April 21, 2008.

[14]    An example is cryptography use in the TDO proposal.  See H.M. Gladney, Trustworthy 100-Year Digital Objects: Evidence After Every Witness is Dead, ACM Trans. Office Info. Sys. 22(3), 406-436, July 2004.

[15]    Since this possibility has not been investigated carefully, it is suggested only tentatively.

[16]    Adam Farquar and Helen Hockx-Yu, Planets: Integrated Services for Digital Preservation, Intl. J. Digital Curation 2(2), 2007.  See also Digital Repository Infrastructure Vision for European Research, draft, March 2007.

[17]    H.M. Gladney and R.A. Lorie, Trustworthy 100-Year Digital Objects: Durable Encoding for When It's Too Late to Ask, ACM Trans. Office Info. Sys. 23(3), 299-324, July 2005.

[19]    John Maynard Keynes, Economic Possibilities for Our Grandchildren, 1930 lecture reproduced in Keynes’ Essays in Persuasion, Norton, 1963, pp. 358-373.

[20]    Such an opportunity has been sketched in Economic Analysis …, loc. cit. endnote 22.  Lavoie ignored this, instead quoting the amusing, but trivial, anecdote about SMOP (A Simple Matter of Programming).

[21]    Information scientists seem not to have noticed the longevity initiatives of the Storage Networking Industry Association (SNIA).

[22]    H.M. Gladney, Economics and Engineering for Preserving Digital Content, preprint, December 2007.

[23]    Lecture typescripts are published in SIGMOD Record 37(2), June 2008.  See also Jim Gray’s publication list.

[24]    Jim Gray, Why Do Computers Stop and What Can Be Done About It? German ACM Conf. on Office Automation, Oct. 1985.

[25]    Luther Martin, Key Management Infrastructure for Protecting Stored Data, IEEE Computer 41(6), 103-4, June 2008.

[26]    R.L. Heilbroner, The Worldly Philosophers: The Lives, Times, and Ideas of the Great Economic Thinkers, Simon and Shuster, 1953 and 1999, Chapter IV, The Gloomy Presentiments of Parson Malthus and David Ricardo.

[27]    Daniel R. Headrick, When Information Came of Age: Technologies of Knowledge in the Age of Reason and Revolution, 1700-1850, Oxford U.P., 2000, ISBN 0-195-13597-0.

[28]    Kenneth S. Deffeyes, Hubbert’s Peak: the Impending World Oil Shortage, Princeton U.P., 2001, ISBN 0-691-09086.6.

[29]    M.K. Hubbert, Nuclear Energy and Fossil Fuels, Proc. Am. Petroleum Institute Drilling and Production Practice, Spring Meeting, 7-25, 1956.

[30]    Daniel Yergin, The Prize: The Epic Quest for Oil, Money, and Power, Simon and Shuster, 1991, ISBN 0-671-79932-0.

[31]    "LTS" means "Long Term Support": maintenance and help for at least three years.  The version number, "8.04", indicates the year and month of release to ordinary users.