|
Perspectives on Trustworthy
Information |
Volume 3, Number 1, 1Q2004 |
|
|
|
|
|||
|
|
HMG
Consulting Saratoga, CA 95070 |
©
2004, H.M. Gladney
ISSN: 1547-8610 |
Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. Attributed to Laurence J. Peter
Until now, DDQ preservation discussion has focused on works of individual authorship—cultural works. It has almost ignored preservation of office records constituting business transaction audit trails that are often legally required. The practices surrounding office records are different from those for cultural works. For instance, what the (U.S.) National Archives and Records Administration (NARA) is emphasizing differs significantly from what the Library of Congress (LoC) is working towards.
What follows focuses on semantic and technical factors, ignoring other differences between office records and cultural works. NARA digital record management plans are influenced by the following factors:
q The content of business archives is evidence of the quality of its source agencies’ work.
q Losing almost any archival collection would have readily identified legal and practical consequences.
q Governmental collections are mostly not encumbered by third party copyright.
q Since preservation is mandated, funding for NARA’s digital archive is relatively secure.
q The cost of creating each office record tends to be much less than that for each cultural work.
Consider current activity at NARA and at LoC. It should not be surprising that what is urgent in managing office records might be different from what is urgent for cultural works. The topics and language of articles by NARA and its San Diego Supercomputer Center (SDSC) research partner[1] are so different from those in other digital preservation literature that some readers might find their papers difficult to understand.[2] How can NARA be ready to acquire a large-scale digital archive system[3] when LoC seems far from understanding what digital preservation service it might need? [4]
This puzzled me even though an explanation had been visible for three years. Perhaps my focus on preserving cultural works blinded me to the different circumstances of office record collections. The information flow of typical cases (Figure 1 and Figure 2) is sufficient to explain quite different technical emphases and software solutions. Consider just two aspects among many: the prior history of a typical accession into a long-term repository and the tension between the content of each accessioned object and what that content is intended to convey.
For commercial and national archives, the accession unit will be a collection of related office records (e.g., correspondence between a foreign office and an embassy), records that individually derive historical context from their siblings and from collection metadata. Each collection member is a ‘record’ in the sense meant by professional archivists, being information about a specific historical event whose context is communicated by metadata and by the member’s position among and relationship to siblings. The metadata include format and content rules that often antecede individual records and that might include business control statements such as retention rules.
In contrast, a typical research library holding is a work of individual authorship that some professional cataloguer accessioned into the collection without much accompanying evidence of its historical significance or its relationships with other holdings. Historical information and relationships are typically added to library contents to only a limited extent by a library employee (a cataloguer) and perhaps more comprehensively by scholars years later.[5]

Figure 1: Information flow for digitally preserved office records[6]
Figure 1 illustrates that each NARA accession is likely to be a collection that has been bounded by articulated rules and procedures refined over several years by a goverment agency, and that has been subject to administrative control and curation similar to that provided by archives. The purposes and structure of each agency collection are likely to be documented, and the accessioning archivists will almost surely have opportunity to collaborate with agency administrators to refine metadata, to enhance ontologies, to determine the bounds of each collection, and to augment information about the collection significance.[7] Individual office records are likely to be similar to other office records in the same collection, and such details are likely to be understood by records administrators.[8] Research libraries are not likely to be provided similar information by the authors or editors of cultural works.
The Figure 2 author of a cultural work is likely to want to convey original conceptual structures, and also complex relationships with prior works. Much of his effort will have been to represent mental constructs in ways that help readers achieve similar mental constructs. In turn, a diligent reader will want to tease the author’s ideas from the written representation, even though this reader cannot converse with the author.
In contrast, people rarely care as much what an authoring bureaucrat thought as they do about the relationships of each record to other office records and to the agency’s objectives. The written representation tends to be more important than authors’ intentions. In some cases, the author’s thoughts about his output are administratively pre-empted by the content. For instance, in contract litigation, the written words have unconditional priority over what the agreeing parties might have intended.
The specific words (symbols) used in office record collections are of interest, particularly if each is used similarly wherever it occurs. This last is likely to be encouraged by agency glossaries. Similar jargon might occur even without administrative encouragement because employees share culture. Furthermore, the typical size of NARA accession units is larger than that of research library holdings. For such simple reasons, an office record collection is likely to have many more occurrences of each pattern than any cultural work, and the number of relationship instances within such a collection is likely to be much greater than that within or between cultural works.

Figure 2: Information flow for preserved intellectual
works6
Such circumstances tend to make ontological analysis interesting for office collections and suggest why ‘knowledge management’ is a high priority in SDSC investigations, whereas research librarians have long been mired in discussions about standardization of terms of reference and subject categories.[9]
Prior DDQ numbers have described how to ensure perpetual usefulness of individual digital objects and related authenticity evidence. We believe the methods they describe will be useful also for office records, but do not yet understand the issues sufficiently to assert that it will be so. Part of our difficulty is that the NARA/SDSC publications[10] provide neither examples of the knowledge rules they allude to nor collection examples that help the reader guess what these rules might be.
Pessimism about digital preservation seems to infect the research library and archive community. Recent writings describe the circumstances as ‘ironic’, along the lines illustrated by the boldface type in:
“The problem faced by [those] who aim to preserve history by preserving [digital] records is that [they] … may be as ephemeral as messages written in the sand at low tide … It is ironic that the primitive technology of ancient times has produced records lasting hundreds of years, while today’s advanced electronic world is creating records that may become unreadable in a few years’ time.[11]
“The correct interpretation of records has always required knowledge of the language in which they were written, and sometimes of other subjects too …. Fortunately enough of this knowledge has survived that we can make sense of most of the records that have come down to us. … Just as interpretation of the 1086 Domesday Book depends on the dictionaries and grammars for medieval Latin painstakingly compiled by long-dead scholars, interpretation of contemporary electronic records … will only be possible if the necessary methods and tools are … preserved now.” [Darlington]
Nobody has given persuasive reasons why ‘ironic’ might be apt. We are left to guess why people say so.
The putative statistic on which 'ironic' is based involves an unreasonable comparison, viz., the fact that some old paper documents have survived, compared to the fact that some digital documents might not survive. Consider the following historical and economic factors.
(1) A plausible comparative statistic is ‘storage effectiveness’—integrating content amount over time. Some paper has stored about 3000 characters per page for 500 years, i.e., for a 300-page book the retention has been about 5*10**8 character-years. A hard disk drive (of roughly the same size, weight and price as a book) can be counted on to save about 100 gigabytes for at least 5 years, i.e., about 5*10**11 character-years. With this measure, magnetic technology is 1000 times more effective as a storage medium than is paper. (‘1000 times’ is a conservative estimate.[12])
(2) Today's digital preservation quality measures are more rigorous (roughly <1 undetectable character error in 10**10) than those that have been or are still now expected for documents stored on paper.
(3) It being unnecessary, we are unwilling to work as hard on modern information as did the “long-dead scholars” who “painstakingly” compiled ancient dictionaries and grammars.
(4) Technology offerings respond to markets. The marketplace has not asked for long-term retention.[13] Instead, what people have asked of digital technology is fast search, fast access from a distance, and immense capacity—qualities neither expected of nor delivered by paper.
(5) We began to share digital objects only 20-30 years ago. Over roughly 2000 years society has built an immense infrastructure and invested heavily in education for using paper. It's hardly surprising that such infrastructure and education have not yet been matched by digital equivalents—especially not for applications for which no widely expressed demand exists.[14]
(6) Rather than long-term preservation, the commercial market for records management apparently wants controls and automation for discarding records as soon as the law permits and internal needs have been satisfied.[15]
(7) In-depth professional discussions of digital preservation started only about five years ago. Plausible solutions for the technical components have been identified in prior DDQ numbers[16] and elsewhere.
(8) The Internet Archive is saving a significant fraction of Web-accessible data. Its Recall search service has indexed the text of over 10**10 pages.[17] An IBM Research service called WebFountain™ has gathered a 500-terabyte database for analysis.[18] There is little a priori reason to doubt that such collections can be made to survive forever.
Arguably,
digital preservation lags digital access because society values rapid
gratification over enduring value.
Perhaps Darlington calls the situation ‘ironic’ because National
Archives personnel want quick gratification of their priority—durable
copies. If so, that would be ironic!
Participation in the Open Archives Initiative (OAI) seems to be growing. If you have not been following this, DDQ recommends Using OAI … Differently, The Expanding World of OAI, and also:
q The specification of the OAI Protocol for Metadata Harvesting (OAI-PMH);
q A Guide to Institutional Repository Software describing systems—ARNO, CDSware, DSpace, Eprints, Fedora, i-Tor, and MyCoRe—intended to allow an institution to implement an OAI-compliant repository without resorting to in-house technical development; [19]
q OAISTER from U. Mich., a collection of freely available, difficult-to-access, academically-oriented digital resources that are easily searchable by anyone. As of 6th February 2004, this held 3,016,267 records from 267 institutions;
q The RoMEO Project (Rights MEtadata for Open archiving), a project to investigate the rights issues surrounding the 'self-archiving' of research; and
q A position on Priorities for OAI Community.
The Public Record Office PRONOM service collects information about objects with the file formats of electronic data, about the software products required to create, render and migrate objects with these formats, and about supporting vendors.[20] The PRONOM database is Web-accessible for reports in various formats. It currently documents ~550 file formats, ~250 software products, and ~100 vendors.
For the home computing enthusiast, PC Magazine suggests “ways to ensure that the contents of your discs are readable down the road and to set up a backup plan to keep your archives safe.” [21] These guidelines seem sufficient for preserving digital photographs and personal data for 25 years or longer, i.e., at least until current preservation research matures into practical offerings. Surely research libraries can work out practical equivalents for their scales and environments!
Attempts to prove the existence of God reach back at least to St. Thomas Aquinus. A February 2004 Google search for Web pages containing (“existence of God”+”proof”) yielded 49,100 ‘hits’, including a Web page identifying over 300 'proofs'. Bertrand Russell wrote, [22]
“Intellectually, the effect of mistaken moral considerations upon philosophy has been to impede progress to an extraordinary extent. I do not myself believe that philosophy can either prove or disprove the truth of religious dogmas, but ever since Plato most philosophers have considered it part of their business to produce "proofs" of immortality and the existence of God. They have found fault with the proofs of their predecessors—Saint Thomas rejected Saint Anselm's proofs, and Kant rejected Descartes’—but they have supplied new ones of their own. In order to make their proofs seem valid, they have had to falsify logic, to make mathematics mystical, and to pretend that deep-seated prejudices were heaven-sent intuitions.”
A Web page details Kurt Gödel's 1970 Ontological Argument, a mathematical proof that caused a stir among Gödel's colleagues. Consider an abbreviated version:[23]
|
Axiom 1 |
(Dichotomy) A property is positive if and only if its negation is
negative. |
|
Axiom 2 |
(Closure) A property is positive if it necessarily contains a positive
property. |
|
Theorem 1 |
A positive property is logically consistent (i.e., possibly it has
some instance.) |
|
Definition 1 |
Something is God-like if and only if it possesses all positive
properties. |
|
Axiom 3 |
Being God-like is a positive property. |
|
Axiom 4 |
Being a positive property is (logical, hence) necessary. |
|
Definition 2 |
A property P is the essence of x if and only if x
has P and P is necessarily
minimal. |
|
Theorem 2 |
If x is God-like, then being
God-like is the essence of x. |
|
Definition |
NE(x): x necessarily exists if
it has an essential property. |
|
Axiom 5 |
Being NE is God-like. |
|
Theorem 3 |
Necessarily
there is some x such that x is God-like. |
What logical error has been made? Hint: the mistake is similar to the Russell’s Paradox error discussed in DDQ 2(2). I identify it at the end of this DDQ number.
Software engineers repeat the mantra, “Every computer science problem can be solved by adding a level of indirection.” ‘Adding a level of indirection’ is jargon for replacing an object reference by a reference to that reference, for instance as the argument of a service invocation. ‘Every problem’ is an exaggeration, but the advice is good for many problems.
For instance, adding indirection is the first step towards hiding irrelevancies from HTML browsers, such as details that distinguish a locally stored file from a remote Web page. This comes into play whenever you ‘drag’ an HTML file (reference) and ‘drop’ it into almost any Web browser window. It works because algorithms for displaying HTML data do not depend on how those data are stored.
A compendium of software engineering mantras might include:
Cache everything.
Make the common case fast (and the rare case correct).
Use randomness to avoid the worst case probabilistically.
If neither approach works, try a hybrid approach.
If you can't solve a hard problem, first transform it to an easy problem, then solve the easy problem.
Do the math, know the best case before you start, and observe Amdahl's Law.[24]
That ‘there are N numbers in the set [0..N]’ is not true.[25]
Any programming problem can be solved by adding a level of indirection.
Any performance problem can be solved by removing a level of indirection.
In response to an open letter from IBM™, Sun Microsystems™ plans to meet with IBM to discuss jointly developing an open-source version of Java. See http://eletters.eweek.com/zd1/cts?d=79-510-1-5-70562-60055-1 and http://www.eweek.com/article2/0,4149,1539358,00.asp
DDQ 2(4) reported that Microsoft technical support for Windows 98™ was ending. In January Microsoft announced that it would extend support for Windows 98 and Windows ME to June 2006. A Website gives reasons why you might want to stick with Windows 98.
seem to be springing up. You might want to consider:
q EIOffice 2004™, a seamlessly integrated Office application from China-based Evermore Software™. It delivers one user interface for combined word processing, spreadsheet, graphics, and presentation preparation. It does seem expensive, especially compared to:
q 602Pro PC Suite™ from Prague-based 602 Software™, and
q OpenOffice™, the “for free” version of:
q StarOffice™ from Sun Microsystems™.
A NEC/Toshiba-led group has developed an HD (high definition) DVD, using a 0.6 mm. thick disk made with machinery similar to that used for today's DVD's. A Sony/Matsushita led group has developed Blu-Ray, a 0.1 mm. thick disk that can hold more data, but will be more expensive to bring to market.
|
Sponsors |
Code
Name |
Thickness |
Video |
Capacity |
Data
rate |
R/W |
|
NEC and Toshiba |
HD-DVD |
0.6 mm |
AVC |
17G / na |
25 Mbps |
no |
|
Sony and Matsushita |
Blu-Ray |
0.1 mm |
HD MPEG-2 |
27G / 50G |
36 Mbps |
yes |
Microsoft has applied for XML patents: EP1376387, "Word-Processing Document Stored in a Single XML File" in Europe, Canada, and New Zealand, US 20040006744, "System and Method for Validating an XML Document and Reporting Schema Violations" in the United States, and additional related filings. Paraphrased to condense the usual legal format of its claims, the core of the European filing is: [26]
A method for handling a word-processing document, comprising: parsing the document, wherein the document is contained within a single XML file and includes instructions to display the document according to how a word-processor would display the document; and to interpret the document according to an XML schema file—optionally extended to displaying the document according to the instructions contained within the XML file, to modifying the document so as to conform with an XSD file which might contain definitions for features incorporated within the word-processor, to formatting the text according to style and properties contained within the singe XML file, and to extracting text from the XML file by searching for single tag.
A computer-readable medium having computer-executable components, comprising components for reading a word-processor document stored as an XML file, for using a schema for interpreting the document, and for performing an action on the document—with optional further components for validating the document, for displaying the document, for schema representing a word-processor's rich formatting, using the schema for other applications, and for including hints to applications that understand XML. Further claims include using the computer-readable medium for parsing, modifying, reading, and creating the document, fully recreating the document according to a word processor's features, storing a binary-encoded image within the document, optionally together with template information.
A system for accomplishing the tasks alluded to above, and for various similar tasks and file formats.
Why ridiculous? In 1969 IBM Research colleagues[27] designed GML for purposes that included all those mentioned above. GML evolved into SGML, which in turn was restricted in ways that made machine parsing easier and faster; the resulting language was named XML. In the unlikely case that a court might hold using XML for word processing to be sufficiently different from using GML or SGML, and further find irrelevant 40 years prior innovation by other companies before Microsoft apparently plagiarized other people’s work, to uphold Microsoft’s claims the court would further have to find that what Microsoft is claiming is “not obvious to an engineer versed in the state of the art.”
Why is Microsoft belated claiming other peoples’ inventions? Its spokesperson’s statement, “While the XML standard itself is royalty free, nothing precludes a company from seeking patent protection for a specific software implementation that incorporates elements of XML,” is hardly helpful. The only plausible explanation known to DDQ is that Microsoft is trying to protect itself from third-party lawsuits by others who might have made the filings that Microsoft itself is making.[28]
EclipseTM
is a toolkit and a platform for building integrated development environments
(IDEs) that can be used to create applications as diverse as web sites,
embedded JavaTM programs,
C++ programs, and Enterprise JavaBeansTM.
An available white
paper provides an introduction:
a technical overview of Eclipse architecture and a case study of building a
full-featured Java development environment.
As
the Eclipse open-source tools initiative leaves its IBM-led consortium to
become an independent, nonprofit corporation, it deserves a second look from those who have previously rejected it.
February news announced promising technology invented by Hideya Kawahara, a Sun Microsystems engineer. “A Web page appears to be transparent, allowing you to see the details of other pages behind it. Kawahara grabs the page with the mouse and stacks it sideways so that it becomes just one among several books on a shelf. Then he spins it around so that he can write a Post-it note on the back.”
Born in Ohio in 1842, Ambrose Bierce became a San Francisco journalist, short story writer, and critic—one of America’s most celebrated wits—whose barbs were aimed at folly, self-delusion, politics, and any pompous pursuit. H. L. Mencken said The Devil's Dictionary contained "some of the most gorgeous witticisms in the English language." An abbreviated example is:
“Lexicographer, n.
A pestilent fellow who, under the pretense of recording some particular stage
in the development of a language, does what he can to arrest its growth,
stiffen its flexibility and mechanize its methods. For your lexicographer, having written his dictionary, comes to
be considered "as one having authority," whereas his function is only
to make a record, not to give a law.
The natural servility of the human understanding having invested him
with judicial power, surrenders its right of reason and submits itself to a
chronicle as if it were statute. Let
the dictionary (for example) mark a good word as "obsolete" or
"obsolescent" and few men thereafter venture to use it, whatever
their need of and however desirable its restoration to favor—whereby the
process of impoverishment is accelerated and speech decays.”
Peter Drucker, the dean of business management consulting, writes:
“Adventures
of a Bystander is not a book about the
Great and Famous, although I have known a good many of them. … The people … were all chosen because each
of them, in his or her own highly personal way, reflects and refracts the thirty crucial years from the end of
World War I to the first post-World
War II decade—the thirty years that largely formed the world in which we now live.”
Typified by its chapter, The Man Who Invented Kissinger, the book is about immensely gifted people who did not achieve public acclaim. My favorite chapter, perhaps because of a slight personal connection, is that about the Polanyi family. [31]
“The Polanyis …were the
most gifted family I have ever known or heard of. They were also the most achieving family; every one of them had success and
impact. But what made them truly remarkable was that all of them,
beginning with the father in
Victorian days and ending with Karl and his brother Michael in the 1960s, enlisted in the same cause: to overcome
the nineteenth century and to
find a new society that would be free and yet not "bourgeois” or "liberal"; prosperous and yet not
dominated by economics; communal and
yet not a Marxist collectivism. … They reminded me of the Knights of the
Round Table setting out in search of Holy
Grail, each in a different direction.
“Each one found an ‘answer’—and each then realized it was not ‘the answer.’ I know of no family that was so successful, by the standards of the world, and such a failure when measured by its own expectations.”
This book, of special interest to IBM people, is also of broader interest. Some of its Appendix B advice can be found in other books, but few authors have the authority implicit in Gerstner’s achievements. His projection of the next decade of the high technology marketplace is particulary recommended. As an example of illuminations the book provides, consider:
“…corporations … need the communities where their customers
and employees live to be strong, as much as they need successful research,
planning, and advertising. Contributing
to their communities is, therefore, good business, too.
“However, … corporations that regularly allocate a certain
amount of money in their budgets for philanthropic activities and then dole it
out over the year to miscellaneous charitable organizations are certainly doing
some good [, but] underperforming in a substantial way.
“ … corporate giving is not a significant part of the cash
contributions … to charities in America …
A surge in personal giving could make up for all of corporate
philanthropy cash without a lot of effort.
“ … corporations do certain things better than all the other
parts of our society. Most important,
they know how to plan, manage resources, communicate to constituencies, and
conduct many other productive activities that are also required by nearly all
nonprofit organizations. Skills in
these areas are very important to charitable organizations, but such skills
are rarely found in sufficient quantities to allow the emergence of successful,
self-renewing organizations. Where else
can these organizations turn for the help they need in building and maintaining
organizational excellence? Certainly
not governments, …
“ … In 1995, I spoke to the National Governors Association
…, urging them to step up … public school reform in their states. …
They said to me: "We agree with you, Lou, and want to do a lot more. However, we cannot do it without the help of
the business community. We need the
business community to energize and drive change in our state legislatures, in
our school boards, in our school … bureaucracies. We need you standing side by side with us explaining the urgency
of the problem and demanding that tough decisions and changes be made.
“As a result of that meeting, an organization called Achieve
was created … a combination of active governors and chief executive officers
who have worked to build a consensus and a momentum behind standards-based
education reform in the United States.
“This is clearly not checkbook philanthropy. This is hard work. It's not glamorous. It
doesn't make the front page of the newspapers, yet it is a task—as the governors
said emphatically—that CEOs and their corporations can carry out uniquely for
our children.”
In connection with his editorial role for the Artech computing security textbook series, Oppliger provides a structured bibliography. Depending on your interests, select links from:
TechRepublic's TCP/IP Quick Reference summarizes the TCP/IP protocol suite and how it corresponds to the OSI (Open Systems Interface) reference model.
DNS (Domain Name Services) is the distributed directory used to route Internet messages. When DNS does not work for you, figuring out what's wrong can be confusing. A Visio flowchart might help.
Detailed information on Linux system commands, command syntax, and command switches is provided in a one-page LINUX Command Quick Reference.
Within the chaff pile discussing spam problems and measures, we found two kernels worth looking into.
Flavio Garcia and Jaap-Henk Hoepman, in Spam filter analysis (submitted to SEC 2004), investigate the effectiveness of several spam filtering techniques by simulating e-mail traffic under different conditions. They find that genetic-algorithm-based spam filters perform best in e-mail servers and that Bayesian filters are the most appropriate for filtering in e-mail clients.
In the February DEMO show, TurnTide™ showed an enhanced Internet router that throttles spam at the interface between an enterprise network and the external Internet. TurnTide claims that “upon receiving spam, TurnTide engages in battle by restricting the bandwidth and resources a particular spammer can use, prohibiting the spam from even leaving the spammer’s servers,” and that the router blocks mass mailings without interfering with legitimate e-mail.
Beware of this rapidly growing class of Internet scam.[33] According to PC Magazine,
“Phishing attacks involve the mass distribution of 'spoofed' e-mail messages with return addresses, links, and branding [of] banks, insurance agencies, retailers or credit card companies. [Phish] are designed to fool the recipients into divulging personal authentication data such as account usernames and passwords, credit card numbers, social security numbers, etc. Because [they seem genuine], … recipients may respond …, resulting in financial losses, identity theft, arid other fraudulent activity.”
provides integrated cross-platform network services for enterprises needing to integrate Windows and UNIX-based environments. Starting with the SFU 3.5 release, the software will be free to all Windows users. Functional and download information is Web-accessible.
As DDQ 2(4) anticipated, technology prices resumed downward motion promptly after the holiday season, with significant drops for writable DVD drives and media and the appearance of affordable Wireless-G transceivers. Prices observed[34] since DDQ 2(3) appeared include:
|
HDD |
Hitachi
180Gb 7200rpm 8.5msec |
$75. |
$0.42/Gbyte |
|
Memory |
PC2100
DDR 266 Mhz memory |
$43. |
$0.08/Mbyte |
|
Memory |
PC3200
DDR 400 Mhz memory |
$63. |
$0.12/Mbyte |
|
CD-R
optical disks |
Package
of 50 |
$8. |
$0.23/Gbyte |
|
DVD-R
optical disks |
Package
of 25 |
$19. |
$0.18/Gbyte |
|
DVD-RW
drive |
Fry’s,
brand not stated |
$84. |
each |
|
Wireless-G
router |
D-Link
Cable/DSL router w/4 Ethernet ports |
$63. |
each |
|
Wireless-G
adapter |
D-Link
PCI for desktop computer |
$43. |
each |
|
Wireless-G
adapter |
D-Link
PCMCIA for laptop computer |
$43. |
each |
|
Wireless-B
router |
D-Link
Cable/DSL router w/4 Ethernet ports |
$33. |
each |
|
Wireless-B
adapter |
D-Link
PCI for desktop computer |
$22. |
each |
|
Wireless-B
adapter |
D-Link
PCMCIA for laptop computer |
$22. |
each |
SanDisk, the world's biggest maker of flash-memory cards, expects prices of the memory cards used in devices such as digital cameras to decline 40 percent in the latter half of 2004 because of supply glut.
Software prices that seem to be too good to be true are advertised at http://www.softforlive.biz/.[35] For instance, Microsoft Office XP is offered for $60. On the other hand, PC World recommends as “the best free software” packages: the Mozilla 1.6 browser, Mozilla Firebird e-mail, the TextShield Fusion word processor, the 602 Suite for office, the PC Pitstop virus scanner, the ZoneAlarm firewall, and more.
The fallacy has to do with the word 'God-like'. The “proof” assumes ‘God-like’ means something beyond what that its Definition 1 calls for—that 'God-like' has something to do with another sign/word 'God'.[36] That the two signs look alike on paper does not mean that they have a meaningful relationship.
Putting it another way, the author of a formal proof can choose whether or not to define a word. If he does so, we take his definition to be complete. Otherwise we assume he is using the word with its ordinary every-day sense. Combining a formal with a common meaning can lead to nonsense.
This kind of mistake is common. That Gödel made it illustrates that even the most careful thinkers are susceptible. Recall Wittgenstein’s summation, “In this way the most fundamental confusions are easily produced (the whole of philosophy is full of them).” [37]
[1] SDSC and NARA jargon include ‘collection-based’ and ‘persistent archive’ and apparent rhetoric (‘knowledge management’). See two NARA papers by Kenneth Thibodeau: Overview of Technological Approaches to Digital Preservation and Challenges in Coming Years, in CLIR’s The State of Digital preservation: An International Perspective, April 2002; and Knowledge and action for digital preservation: Progress in the US Government, Proceedings of the DLM-Forum 2002 Workshop, @ccess and preservation of electronic information: best practices and solutions, 510-517, 2002.
From the many SDSC papers listed at http://www.sdsc.edu/NARA/Publications.html, DDQ recommends: Arcot K. Rajasekar and Reagan W. Moore, Data and Metadata Collections for Scientific Applications, European High Performance Computing conference, Amsterdam, Holland, June 26, 2001; and R. Moore, The San Diego Project: Persistent Objects, Proceedings of the Workshop on XML as a Preservation Language, Urbino, Italy, October 2002. More recently, the SDSC team has been making contributions to data grid technology.
[2] In
a private communication Reagan Moore reminded me that in 1999 the SDSC was
confronted with different sets of semantic terms used in the digital library,
preservation, and computer science communities, and that the differences still
exist in what people mean by ‘data’, ‘information’, and ‘knowledge’. His careful note is too long to reproduce
here, but will influence what I plan to write in DDQ 3(2) on aspects of
semantics.
[3] The NARA ERA Request for Proposals is available via http://www.archives.gov/electronic_records_archives/acquisition/rfp.html.
[5] Typically, the historical relationships and technical similarities of research library holdings are found and documented as part of the accessioning process, or much later. For instance, certain 16th-century holdings of the Vatican Library (Biblioteca Apostolica Vaticana) were first put into context by finding aids written in the 18th century.
[6] In the figures,
depicts a digital transmission.
depicts potential human conversation.
[7] Each office record collection is likely to be large (many thousand records), valuable, and contain only records that conform to a few well-known schema. Whenever this is so, the human labor needed for accession will be affordable. Similar favorable quantitative circumstances are unlikely for accessioning of cultural works.
[8] See, for instance, P. Berkman, G. Morgan III, Automated Granularity of Authentic Digital Records in a Persistent Archive; Research Report from EvREsearch, August 31, 2003, which treats the case of the U.S. Code, illustrates the point.
[9] Efforts to handle the problem are illustrated by “The GSAFD Thesaurus” in Van de Sompel, Herbert. Young, Jeffrey A. Hickey, Thomas B. “Using the OAI-PMH ... Differently”, D-Lib Magazine 9(7), 2003.
[10] See http://www.sdsc.edu/NARA/Publications.html and also Web sites linked to by http://www.archives.gov/electronic_records_archives/papers_presentations.html.
[11] The boldface emphasis is a DDQ addition.
[12] Of course, we would copy the content onto new media whenever a carrier became unreliable. For large collections, currently available technology can manage this almost automatically. The estimate does not take into account expected technology and cost improvements.
[13] Research libraries and archives are not an attractive market for software offerings, partly because there are so few of them, but even more because they (apparently) expect industry to provide technology below cost, e.g., by asking for donations at the same time as they try to acquire technology. Great data, but will it last? (Vanessa Spedding, Research Information, Issue 5, Spring 2003) collects academic opinions that typify the technical pessimism and marketplace optimism that DDQ numbers question.
[14] Digital preservation proponents constitute only a tiny fraction of the literate population.
[15] For an example of software industry response, see Miller, Bruce. “Coming Soon: E-records”, IBM DB2 Magazine, 1Q04. IBM’s recently announced TotalStorage™ offering emphasizes managing records disposition consistent with the data retention and corporate governance requirements of laws and regulations ranging from Sarbanes-Oxley to HIPAA to SEC Regulation 17a-4.
[16] My colleagues and I would appreciate candid criticism. Nobody has yet offered any criticism that suggests that our methodology is flawed, notwithstanding our repeated invitations to do so.
[17] See a presentation about the search technology and a discussion forum. The British Government recently archived many of its web pages, ensuring long-term preservation by way of a contract between the U.K. Public Record Office and the Internet Archive.
[18] See http://www.mercurynews.com/mld/mercurynews/business/7870991.htm. WebFountain scours Web logs, newspaper stories and other sources, and searches this content for patterns that the most dedicated librarian cannot find. Instead of just matching patterns, WebFountain analyzes a subject in 50 different ways (e.g., noting how often two people’s names are associated) to answer questions precisely. An immediate stimulus is the Sarbanes-Oxley legislation.
[19] Each satisfies three criteria: Open Source licensing, compliance with OAI-PMH to ensure interoperation in a global archive network, and current general availability.
[20] Jeffrey Darlington, PRONOM—A Practical Online Compendium of File Formats, RLG DigiNews 7(5), October 2003.
[21] Leon Erlanger, Memories that Last, PC Magazine 62-63, Jan. 20, 2004.
[22] Bertrand Russell, A History of Western Philosophy (1945), in Chapter XXXI, “The Philosophy of Logical Analysis”.
[23] See Clifford A. Pickover, Wonders of Numbers, Oxford UP, 2001; Chapter 37. ISBN 0-19-515799-0
[24] A 25-Feb-04 Web search for “Amdahl’s Law” yielded 7400 hits; take your pick!
[25] Overlooking this is a common source of programming errors
[26] In order to eliminate loopholes, claims written by attorneys look like simple computer programs, but are verbose.
[27] Charles Goldfarb, Edward Mosher, and Raymond Lorie. Goldfarb makes available his view of early XML history.
[28] The cost of fighting a lawsuit, even one without merit, can be so high that it is cheaper to pay off the claimant.
[29] Bierce, A. Appelbaum, Stanley (ed.). The Devil’s Dictionary, Dover Books, 1993. ISBN 0-486-27542-6 pp.700-1
[30] Peter F. Drucker, Adventures of a Bystander, Harper & Row, New York, 1979/1994. ISBN 0-471-24739-1
[31] Michael Polanyi, the youngest sibling of his generation, did widely praised work as a student of Albert Einstein in Berlin, became famous as Manchester Professor of Physical Chemistry, and equally renowned for his post-WW-II work in philosophy. His son, John, later a Nobel Prize winner in Chemistry, was my first physical chemistry teacher. This was also John Polanyi’s first year as an Assistant Professor at the University of Toronto.
[32] Louis V. Gerstner, Who Says Elephants Can’t Dance: Inside IBM’s Historic Turnaround, Harper, 2003. ISBN 0-06-052379-4
[34] The prices are mostly from San Jose Mercury News advertisements. Better deals might be available from on-line shopping services. To facilitate “level playing field” comparison, sales taxes and shipping costs are included in the estimates.
[35] I do not know whether this advertiser is reputable. It is said, “If it looks to good to be true, it probably is.”
[36] As Wittgenstein emphasizes in Tractatus Logico-Philosophicus, 3.11 ff., a word or sign should be used in only one sense within a logical discussion.
[37] Ludwig Wittgenstein, Tractatus Logico-Philosophicus, 3.324.