|
Perspectives on Trustworthy Information |
Volume 8, Number 1, 1Q2009 |
|
|
|
|
|
|
HMG Consulting |
© 2009, H.M. Gladney ISSN: 1547-8610 |
DPC Technology Watch Report provides a careful description of JPEG 2000—a proposed digital image preservation standard.[1]
Michael Day has summed up cultural heritage community views about trusting stored information.[2] This and related articles do not address some key sources of risk—sources of interest to CPAs and attorneys:
(1) End users—the people who, 100 years from now, might depend on sensitive archived documents—are being asked to trust collection custodians. How can they judge whether such trust is prudent?
(2) In the Trusted Digital Repositories approach,[3] the correct working of a repository is required.[4] How can an end user decide whether or not this has been achieved, given his limited pertinent skills and limited time and energy available for the task?
(3) What specifically is to be trusted? The only occurrences of “trusted to” in the TRAC documents[5] are in: “They are trusted to store these valuable materials. They are trusted to provide access to them in order to document and reveal history as well as to foster the growth of knowledge. They are trusted to preserve these items to the best of their ability for future generations.” There are no occurrences of “trusted by" in either report!
Trust for repositories
is summed up in
The tensions are illustrated by how some authors attempt to transfer the “chain of custody” concept from information recorded on paper to digital information. Careful control of paper along lines developed over several centuries can provide credible authenticity evidence. But similarly reliable custodial protocols have not yet been described for digital records.
Just as this DDQ number was being completed, IEEE Computer published several papers about Trust Management—too late for careful comment in the current DDQ number.
A 2007
There are two distinct approaches to emulation. KB has focused on emulation of entire computing environments in order to provide the perpetual possibility of executing today's application programs,[7] making its Dioscuri pilot available. However, we find:
Rothenberg proposes an
emulator specification … [for which] the effort involved would be
unreasonable. Two examples … illustrate
this objection: First, suppose that future generations are interested in just
viewing a picture. Then [Rothenberg]
emulation still requires [one] to preserve the whole software environment for
creating and modifying the picture.
Second, consider an email sent using Lotus Notes. Here, for future
access the complete software system, which supports a load of other groupware
tasks, would have been preserved, just for reading a simple plain text
email. Worse, application software
provides just one view of
the data … without direct access to the text included. Therefore, it is impossible to transfer the
raw data from the old system into a new system.
In addition, future development of emulation software just on the basis
of a specification is considered extremely risky, … since the result cannot be
tested by comparing it to the original hardware. [Borghoff
page 214][8]
IBM is offering the alternative in a pilot for its Universal Virtual Computer solution.[9]
The abstract of my Critique of Architectures for Long-Term Digital Preservation follows:
Evolving technology and fading human memory threaten the long-term intelligibility of many kinds of documents. Furthermore, some records are susceptible to improper alterations that make them untrustworthy. Trusted Digital Repositories (TDRs) and Trustworthy Digital Objects (TDOs) seem to be the only broadly applicable digital preservation methodologies proposed. We argue that the TDR approach has shortfalls as a method for long-term digital preservation of sensitive information. Comparison of TDR and TDO methodologies suggests differentiating near-term preservation measures from what is needed for the long term.
TDO methodology addresses these needs, providing for making digital documents durably intelligible. It uses EDP standards for a few file formats and XML structures for text documents. For other information formats, intelligibility is assured by using a virtual computer. To protect sensitive information—content whose inappropriate alteration might mislead its readers, the integrity and authenticity of each TDO is made testable by embedded public-key cryptographic message digests and signatures. Key authenticity is protected recursively in a social hierarchy. The proper focus for long-term preservation technology is signed packages that each combine a record collection with its metadata and that also bind context—Trustworthy Digital Objects.[10]
Notice other LDP authors' near-silence in the last year. Absent truly new ideas or challenges from others, I intend to quit writing on LDP except for dealing with referees’ issues with Critique and two other papers.[11]
Perhaps everything that can be said about methods for LDP has been said, so that only implementation is missing. A possible exception that came to my attention after Critique was submitted is a proposal to use multivalent architecture.[12]
In February, a pan-European “fresh start for lost file formats” was announced. This €4.02M Keeping emulation environments portable (KEEP) project, aims to create a universal emulator set—software that can recognise, play and open all types of computer file from the 1970s onwards. As well as basic text documents it will also let people play computer games that technology has left behind.”
The project leaders
assert that, “the number of unreadable documents in archives is [growing].
A brief KEEP technical description is
available, together with a abstract of technology it might use.
TextGrid intends to create a grid for collaborative editing, annotation, analysis and publication of specialist texts for emerging e-Humanities. It intends to integrate technologies for analyzing texts with dictionaries, lexica, secondary literature and other tools. Its intended CommunityGrid will provide for integrating initiatives worldwide. The project asserts that:
[P]ast and current initiatives for digitising and accessioning texts already accrued a considerable data volume, which exceeds multiple terabytes. Grids are capable of handling these data volumes. Also the dispersal of the community as well as the scattering of resources and tools call for establishing a CommunityGrid … for connecting the experts and integrating the initiatives worldwide.
A San Jose Mercury News article starts by quoting Aristotle that a metaphor “is the one thing that cannot be learned from others". Obviously Aristotle was mistaken, since we frequently repeat metaphors that we did not ourselves invent.
A
In early March I posted the following blog inquiry:
In recent years several giant [book] digitizations… have been mounted,[14] some by commercial enterprises and some by academic institutions. My relatively cursory Google search has not led me to a Web service from which one can mount a search that inspects a subset of these to determine whether or not a book (or a scholarly article) is available in digital form.
I wonder, has the kind of service been programmed and made available? … If not, perhaps some list member will take it on.) If you know of an example, I would be grateful for a pointer to it. I suspect that other members of this distribution list would be as interested as I am.
Roy Tennant reacted with, “OCLC and others are working hard to get records for books being digitized by Google, the Open Content Alliance, [OAIster records,] and others into WorldCat”, but that “past cataloging practices can make it rather difficult to determine whether a URL … points to the full text of an item or to some lesser portion”. Peter Noerr suggested that “nobody has built connectors (to handle the standards based and non-standard interfaces) … for "anybody anywhere in the world" coverage. Andrew Hankinson suggested, “[Y]ou are describing [a] "Holy Grail" for digital librarians”.
Subsequent inspection of 22 articles[15] reveals focus on service to undergraduate students uncomfortable with literature search. As important as that might be for the college librarians who wrote these articles, it calls for aspects not needed by mature scholars—aspects that had not been on my mind when I inquired.
Clearly I need to clarify what I’m seeking and what might help other scholars. It follows naturally from the existence of the massive digitization projects tabulated in DDQ 7(1).[16] What I want is a “quick and dirty” PC tool whose limitations I understand and that combines searches over major sources such as those provided by Carnegie Mellon, Amazon, Google, Microsoft, … and research libraries they are partnering with. I would use this as a starting point for more careful searching if and when I wanted that. The tool would allow a user to select the resources searched, might include in its output a guess about the completeness of any object it identifies—a guess made from the sources’ own descriptions of what they provide. It would have a front end similar to that of the SJSU and San Jose Public Library. Finally, its help text would identify its own weaknesses, together with hints how each of these can be overcome.
In short, I’m not looking for any Holy Grail, but instead a tool to save human time and effort. It does not even have to use machine resources particularly efficiently or be very fast. Tom Keays suggested part of a mechanism: using the xOCLCNUM service as input for FRBR search in WorldCat, illustrated by:
For Don Quixote, there is a copyrighted edition at http://www.worldcat.org/oclc/51848364, but you can use xOCLCNUM for a WorldCat FRBR type search to return freely available ebook versions, as follows: http://xisbn.worldcat.org/webservices/xid/oclcnum/51848364?method=getEditions&format=txt&library=ebook&fl=oclcnum,url
Ex Libris MetaLib enables an institution for providing the metasearch I have in mind, but does so from a Web server rather than as the kind of Web browser tool that I could customize for myself.
I subsequently
found promising
offerings, and am starting to test Google
Custom Search Engine. Its user guide
includes an HTML program fragment for embedding its interface into a personal
Web page. A later DDQ number will report
personal experience.
An article by Jeffrey Mervis suggests that the NSF Digital Library is a practical failure.[17]
The National Science Foundation (NSF) has spent roughly $175
million over the last nine years "to provide organized access to high
quality resources and tools that support innovations in teaching and learning
at all levels" … [in] its Digital Library (NSDL) program … to make
potentially useful Web content easy to find and classroom-customizable. NSF funded "core integration"
groups at
A new service, ResearchGate, attempts to expedite scientific communication, asserting that:
Instead of disseminating scientific results in regularly scheduled and printed journal issues, now a continuous release of articles in online format will change and expedite the way new results are spread. Without anonymous review processes, open access journals or wiki-like concepts will assure the quality of science. Hidden conglomerates of various interests will give way to transparent and traceable new concepts of scientific impact measurements. Science is collaboration, so scientific social networks, wikis and other means of collaboration will facilitate and improve the way scientists collaborate. …
skeptics … are right to assert a wide gap between epistemological precept and scientific practice, even if the two are correlated. Epistemology (of whatever kind) advanced in the abstract cannot be easily equated with its practices in the concrete. …
If training a telescope on large, remote causes fails to
satisfy, what about the opposite approach, scrutinizing small, local causes
under an explanatory microscope? The
problem is mismatch between the heft of explanandum and explanans, rather than
distance between them: in their rich specificity, local causes can obscure … wide-ranging
effect that is our subject here. Local
circumstances that may seem to lie behind, for example, a change in surgical
procedures in a late Victorian London hospital, are missing in an
industrial-scale, post-Second World War physics lab in
Quine comments:[19]
… ontology, or the values
available to variables. [W]e can go far with physical objects. They are not, however, known to suffice. … we do not need to add mental objects. But we do need to add abstract objects, if we
are to accommodate science as currently constituted. Certain things we want to say in science may
compel us to admit … not only physical objects but also classes and relations
of them; also numbers, functions, and other objects of pure mathematics. For, mathematics—not uninterpreted
mathematics, but genuine set theory, logic, number theory, algebra of real and
complex numbers, differential and integral calculus, and so on is best looked
upon as an integral part of science, on a par with the physics, economics,
etc., in which mathematics is said to receive its applications.
Philosophy that requires
mental objects is sometimes called “psychologism”.
I have long been puzzled, not by the paradox itself, but rather that the ancients did not solve it even with their limited mathematics. Recall that, in condensed form, the Achilles and the Tortoise paradox reads:[20]
Before Achilles can catch the tortoise he must reach the point where the tortoise started. But in the time he takes to do this the tortoise crawls a little further forward. So Achilles must next reach this new point. But while Achilles achieves this, the tortoise crawls a tiny bit further. And so on … Achilles has in infinite number of finite catch-ups to do before he can catch the tortoise, and so, Zeno concludes, he never catches the tortoise.
The incorrect reasoning is exposed by the word ‘never’. Zeno and his critics might simply have asked how long the race could last. Suppose that, at the start of the race the Tortoise had a 100 meter lead, that Achilles’ speed was 6 meters/second (slow by modern standards), and the Tortoise’s speed was 6 millimeters/second. In 17 seconds Achilles will have achieved 102 meters and left the Tortoise 199.8 centimeters behind.
Thus, Zeno might have discovered[21] that ‘never’ would be shorter than 17 seconds. This absurdity could have alerted the ancients to the fact that the sum of an infinite series can be finite!
DDQ focus is planned to shift from preservation to evaluating three interrelated propositions:
(1) That the fundamental principles underlying the "Information Revolution" were mostly worked out between 1850 and 1960;
(2) That laymen (vis-a-vis science) can achieve comfort with modern and evolving information management by absorbing surprisingly few ideas from the work just alluded to; and
(3) That the single notion of a "pattern" is an effective tool for comprehending most of the critical ideas underlying the Information Revolution.
A start in these directions is implicit in Part II of
Preserving Digital Information.
A January editorial article asserts the urgent need to fix federal archiving policies. An interagency working group has just recommended that the United States develop a strategic policy for preserving and making scientific information accessible in a world in which data increasingly is born, stored and used in digital formats.
2008 was a down year for the economy. However according to PC World, it was a banner year for unfounded technology rumors.
At the end of the first week of April, negotiations for IBM’s purchase of Sun Microsystems broke down, perhaps only temporarily. The stock price of Sun promptly dropped 23%, which probably did not please Sun owners.
As computers become
faster and malefactors become more vigorous, old
cryptographic tools might no longer be sufficiently secure. Some months ago, the National
Institute of Standards and Technology mounted a
competition for new hash
algorithms.
“After Madoff, Donors Grow Wary of Giving,” writes the Wall Street Journal, “but you can spot red flags before you write out a check.”
Mechanized computing is at least 2500 years old. Read about the Antikythera Mechanism.
Learn about medical sciences from the videos in the Charlie Rose Science Series.
Learn about computer software architecture from Grady Booch.
In mid-2008, the Scientific
American Book Club offered a promotional collection whose individual books
were about most interesting numbers: zero, π, e,
—one book
for each number. An excerpt follows:
Attempting to convince someone whose mind is
already made up is difficult. In Mathematical Cranks, Underwood Dudley tells of a cyclometer who wrote
that “π’s only position in mathematics is its relation to infinite
series [and] π has no relation to the circle.
... Lindemann proclaimed the squaring of the circle impossible; but Lindemann’s proof is misleading
for he uses numbers (which are approximate in themselves) in his proof.” How can you argue with that logic? [22]
Like most scientists, I know that five of the most important numbers are related by a single equation,
eiπ – 1 = 0,
but am nevertheless fascinated by this fact.
This little book[23] is an articulate and amusing reminder of differences between common language usage and scientific language, as illustrated by:
A dignified scholar emerges from the [
“That is a long story,” replies the scholar. “Perhaps you
will come to the University some day, and you will learn what geometry is.”
The boy makes a face. “Please, Sir,” he pleads, “I want to
know now. I know it would take a
long time to learn all about it, but can't
you tell me just a little bit now?”
By this time the scholar is very much impressed by the eagerness
of the boy. “Well,” he says, unrolling
his manuscript, “I'll read you the first sentence. … ‘A point is that which has no part.’"
There follows a dead silence.
…
“Please,
Sir,” [the boy] asks, “has a point got a smell?”
… …
… [The boy continues, asking
whether a point has colour, shape, ability to speak, and so on. He finally sums up angrily.]
“I
think you are a dishonest man. You tell
people that a point has no part, but you don't tell them that a point has no
smell, no colour, can't talk, can't hear.
Why do you bother to say that a point has no part, when there are so
many other things it hasn't got? Tell me
that.”
All subsequent epistemological work is influenced by Immanuel Kant’s writings—particularly by his Critique of Pure Reason. Ernst Cassirer, a famous philosopher in his own right, provided Kant’s Life and Thought early in his career.[24] An excerpt suggests its tone:
Kant's
initial years at the university, to judge by the slight information about them
that has been preserved, are also significant more for this education of the
will than for the knowledge furnished him in the regular course of
lectures. In
Engineers
enjoy outstanding designs. An example is
“the World’s Greatest Keyboard”. Edwards
writes, “From the satisfying click of its keys to its
no-nonsense layout and solid steel underpinnings, IBM's 24-year-old Model M is
the standard by which all other keyboards must be judged.” See a slide show.
I’m a fan, using versions extended with the Trackpoint, and piling up in my garage newer keyboards received as part of PC purchases.
Pegoraro provides a critique of income tax return programs. The two most prominent offerings create different tax estimates.
10 Minute Mail is a free service to create a temporary e-mail address for avoiding unwanted solicitations as a side effect of signing up for some service. Ten minutes should be long enough to sign up, receive a confirming e-mail, and send a "yes, it's really me" message. Then the address evaporates. A slow typer can add an additional ten minutes to the email account life.
I have not tried it, as I suspect that it requires one to use an e-mail client, which I don’t need with Gmail—my preferred service. However, I achieve similar filtering with a Gmail address that I use only for service sign-ups, ignoring all other incoming traffic.
PC
World teaches creating an ad hoc network for information transfer among
PCs.
A network hard disk offers easy backup, albeit only supporting recent operating
system versions.
It’s hardly a secret that the landline telephone business is threatened by wireless telephony. Companies are scrambling to provide novel wireless services and handset features. Some of these are for free service funded by sometimes intrusive advertising. Recent annoucements include:
· Free phone calls from your browser using GizmoCall, CallingAmerica, and perhaps other offerings.
· Renting a Blackberry for the road.
· A Google service to redirect an incoming call to several telephone numbers and provide options for handling an accepted call.
· A handset design combining features from today’s smartphones.
Product origins can be determined from Universal Product Codes (aka “bar codes”), albeit not always reliably. The first three digits usually identify the country of manufacture, as follows:
|
00-13 |
|
|
30-37 |
37 |
|
40-44 |
|
|
49 |
|
|
50 |
|
|
57 |
|
|
64 |
|
|
76 |
|
|
471 |
|
|
480 |
|
|
628 |
|
|
629 |
|
|
690-695 |
People’s Republic of |
|
740-745 |
|
San Jose Mercury News offers a good guide for what to look for in a high-definition television.
The AVS Audio Editor reads audio CDs almost perfectly. It is free for non-commercial use.
2008 brought us many free tools to speed browsers to helpful information. Some that I adopted are:
· Enhancements to Mozilla Firefox, keeping it ahead of Windows Internet Explorer. See a review.
· Xmarks (FoxMarks renamed) for synchronizing browser bookmarks across multiple PCs.
· FreeDownloadADay weekly e-mail service alerting recipients to new services.
· ToRead 2-click service to send the content of any Web page to your e-mail.
· Interclue summary information about Web links. Hover your mouse pointer over the link, and an icon will appear. Rest your mouse on the icon to see a linked-page summary.
The Wall Street Journal discussed enhancing Web search efficiency with Surf Canyon and Google’s SearchWiki.
You have perhaps noticed an upsurge of virtual machine software offerings. I am evaluating some to choose among, both for Windows XP (I have tried and rejected Windows Vista) and for Linux Ubuntu. See a comparative review of Sun xVM VirtualBox and VMWare Server.
Virtual Desktops seem to provide a subset of virtual machine services—specifically the ability to see/manage different applications in separate spaces with rapid switching among spaces—a service for MS Windows similar to what is a native part of Linux Ubuntu with KDE. (A virtual desktop is simply a desktop view that displays only selected windows.) WindowsPager enjoys positive reviews. The Fences screen organizer complements it nicely.
Plustek has created a scanner for vision-impaired that reads to you. Just plunk a novel on the platen, punch a button, and relax to the dulcet sounds of a computerized voice reading aloud. The buttons and power switch are marked in Braille. Watch a video (in Japanese, but you can turn off the sound and still get the gist) of the Plustek BookReader in operation. The scanner also produces MP3s or WAV files that you can listen to at a later time, saves images, and produces PDF files of scanned text.
Plustek scanners have a specially designed edge and lamp that allows "zero edge" scanning. (The machine can scan right up to its edge where the book spine is placed.) The book pages are completely flat on the glass, thereby avoiding book spine shadow and distorted text.
The obvious use is for vision-impaired users although it will also work well as a normal scanner. It is not inexpensive (approx. $600), but within range for private purchases.
|
HD TV |
Samsung 50” 720P Plasm |
$1310. |
each |
|
HD TV |
Samsung 32” 720P LCD |
$555. |
each |
|
HD TV |
Samsung 52” 1080P LCD |
$2280. |
each |
|
DVD dual layer |
Sony 20x internal
dual-layer DVD -/+RW drive |
$39. |
each |
|
LCD Monitor |
Acer 23” 1920x1080, 40000:1
contrast, 5 ms. response |
$240. |
each |
|
LCD Monitor |
Hyundai 19” 1280x1024
1000:1 contrast, 5 ms. response |
$130. |
each |
|
LCD Monitor |
Hyundai 21.6” 1680x1050
2500:1 contrast, 5 ms. response |
$150. |
each |
|
LCD Monitor |
Acer P241W 24” 1920x1200, 3000:1 contrast,
5 ms. response |
$330. |
each |
[1] Robert Buckley, JPEG 2000—a Practical Digital Preservation Standard, DPC Report, Feb. 2008.
[2] Michael
Day, Toward Distributed
Infrastructures for Digital Preservation:The Roles of Collaboration and Trust,
Intl. J. Digital Curation 1(3), 2008.
[3] Research Libraries
Group, Trusted Digital
Repositories: Attributes and Responsibilities, May 2002.
[4] The
practical interest of most users will not include most repository contents, but
only the authenticity of very few records.
[5] Loc.
cit. endnote 3. Also RLG-NARA Digital Repository Certification
Task Force, Trustworthy Repositories Audit &
Certification (TRAC): Criteria and
Checklist, 2007.
[6] R. Moore, Towards a Theory of Digital Preservation, Intl. J. Digital Curation 3(1), 63-75, 2008.
[7] Jeffrey
van der Hoeven, Bram Lohman, and Remco Verdegem, Emulation for
Digital Preservation in Practice: The Results, Intl. J. Digital Curation 2(2), 2007.
[8] Uwe
M. Borghoff, Peter Rödig, Jan Scheffczyk, and Lotar Schmitz, Long-Term
Preservation of Digital Documents: Principles and Practices, Springer
Verlag, 2006, ISBN 978-3-540-33639-6.
[9] J.
R. Van Der Hoeven, R. J. Van Diessen, K. Van Der Meer, Development
of a Universal Virtual Computer (UVC) for long-term preservation of digital
objects, J.
Info. Sci. 31(3), 196-208, June 2005.
[10] H.M.
Gladney, Preserving Digital
Information, Springer Verlag, 2007.
[11] The slow publication schedules of the target periodicals suggest that this will not be complete until late 2009.
[12] Thomas
A. Phelps and P.B. Watry, A No-Compromises
Architecture for Digital Document Preservation, Proc. 9th European
Conf. on Research and Advanced Technology for Digital Libraries (ECDL 2005),
September, 2005.
T.A. Phelps, Multivalent
Documents: Anytime, Anywhere, Any Type, Every Way User-Improvable Digital
Documents and Systems, Ph.D. Dissertation, University of California,
Berkeley, 1998.
[13] Brad Pasanek and D. Scully, Mining Millions of Metaphors, 2007.
[14] Brewster
Kahle provides an up to date summary of the Economics
of Book Digitization.
[15] Christopher
N. Cox, Federated Search: Solution or
Setback for Online Library Services, 2008.
[16] In May 2008, Microsoft closed their Live Search Books and Live Search Academic services. The project had scanned 750,000 books and indexed 80 million journal articles.
[17] Jeffre Mervis, NSF Rethinks its Digital Library, Science 323(5910), 54, January 2009.
[18] Lorrain Daston and Peter Galison, Objectivity, 2007, ISBN 978-1890951788.
[19]
W.V.O. Quine, The Ways of Paradox, 1975, ISBN 978-0674948378,
[20] Adapted
from the Stanford
Encyclopedia of Philosophy.
[21] Notice that the reasoning that follows uses only concepts already known 2400 years ago.
[22] David
Blatner, The Joy of π, 1997,
ISBN 0-8027-7562-7.
[23] J.L Synge,
Science: Sense and Nonsense, 1951.
[24] E. Cassirer, Kant’s Life and Thought, Yale U.P., 1981, ISBN 0-300-02982-9