|
Digital Document Quarterly Perspectives on Trustworthy Information |
Volume 6, Number 1, 1Q2007 |
|
|
HMG Consulting |
© 2007, H.M. Gladney ISSN: 1547-8610 |
Digital preservation activities seem to be entering a new phase, shifting from need to solve basic problems to need to implement solutions and establish repository procedures. This is signaled by new books attempting to treat their topics comprehensively.
Public symposia and workshops focusing on digital preservation seem ever more frequent, with roughly one somewhere in the world each month. As the topic is maturing, it would help everyone interested if conference organizers, and also authors, would clearly indicate their focal levels, e.g., one of
· Fundamental principles for knowledge and information preservation (epistemology);
· Methodological principles and software architecture;
· Software tools and other technology implementations;
· Archival institution management; or
· Tutorial presentations, particularly for librarians and archivists.
A Reader’s Comment on Preserving “Everything”
John Erickson commented on recent DDQ numbers: [1]
“… the myth of expert archiving and dangers of not
preserving "everything." There
has been much hubbub over the past few years (partly inspired by the Da Vinci Code) over non-canonical
documentation of early Christianity and how it should be considered. Documents such as the Nag Hammadi (
“… a small sampling of "alternative" documents can have a major impact, … [suggesting] preserving as much as possible. Surely there were other contemporary sources which did not survive. And, considering how the "Canon" was formulated (esp. via the Council of Nicene) and its subsequent effect on what was preserved or destroyed, … "expert" preservation [should] be considered dangerous and suspect, because every such instance is merely an opinionated selection … of what will be [wanted] by future scholars.
“… related to this issue of "preserving as much as possible" [as described in] MyLifeBits[2] … exemplifies the problem of not knowing at artifact creation time what data might be useful or required at use time. Most efforts to simplify the data set will invariably reduce the usefulness of the artifact. A scientist knows never to destroy original data, but our IT culture does it every day; in fact, many corporations have policies for deleting old data for both policy reasons and to save space.
Erickson’s reminders of the selection challenges of building a long-term digital collection are appropriate, but need to be balanced by practicalities.
Saving “everything” needs to be tempered by anticipation of future user’s ability to find what might interest him. An extreme case illustrates the point. During one of the Government-IBM antitrust lawsuits, the judge ordered that every document copy which a recipient had annotated was to be saved—including telephone books. IBM instructed employees to deposit all discards in large collection bins. Allegedly all the content was dumped in warehouses, without any organization or inventory. A hallway joke was that, whenever the court order would be lifted, the price of scrap paper would drop. And it did!
The concern with derivative documents that include editorial change, illustrated with the Nicene Creed, can readily be overcome by responsible conservators. If they accompany each edited document by honest and adequate provenance information that is firmly bound within a resultant archival object, that object will not mislead future scholars. In fact, it will be an authentic archival object.[1]
New Books and Reports on Digital Preservation[3]
The five books reported below are so new that, as far as I know, no independent reviews have yet appeared. It would help the cultural heritage community for some DDQ reader to review some number of these books.
Deegan & Tanner, Digital Preservation[4]
This essay collection sketches recent progress, including how digitization is changing in archives, libraries, and museums. The essays also suggest, in view of digital preservation as a moving target, the value of periodic overviews for deciding next activities. Chapters cover:
· Key issues in and strategies for digital preservation
· The status of preservation metadata in the digital library community
· Web archiving
· The costs of digital preservation
· European approaches to digital preservation
· Digital preservation project case studies.
Digital Preservation is intended to be a guide for information managers, librarians and archivists, as well as for students in library and information studies courses.
Borghoff et al., Long-Term Preservation of Digital Documents[5]
These authors
briefly describe markup and document description languages (TIFF, PDF, XML,
Dublin Core, …), explain migration and emulation techniques, and present the OAIS (Open Archival Information System) Reference Model. To complement this technical background, they
present selected repository projects (at
This work is
intended for librarians, computer scientists, and information managers engaged
with social and methodological requirements for long-term information access.
Masanès has
collected essays about tools, tasks, processes, and standards needed to preserve
portions of the WWW. His book can serve
as an introduction to keeping online information alive. It covers issues related to building, using
and preserving Web archives for computer scientists and librarians. This book is intended to be a
state-of-the-art overview for practitioners.
Batini &
Scannapieco, Data Quality[7]
Batini and
Scannapieco systematically introduce quality issues for federated data[8], web data, and other time-dependent data,
classified according to frequency of change.
The book describes methodologies from core data quality research as well
as data mining, probability theory, statistical data analysis, and machine
learning. It ends with critical
comparison of tools and practical methodologies for data quality problems.
This book is
broadly targeted—for researchers, students, and engineers who want an introductory
course or self-study on its topics.
Gladney, Preserving
Digital Information[9]
Preserving an information collection is a different challenge than managing archives. Preserving Digital Information addresses[10] fundamentals and software design for preserving a file collection[11] indefinitely—methodology claimed to be complete and optimal for any information types (representations) whatsoever. In a nutshell:
· Any information pattern can be protected against loss by replicating its carrier object in independent repositories.
· A perpetually unique document identifier (easily constructed) that is embedded in each preservation object will enable durable indexing for global search engines.
· Ensuring that eventual users can exploit any preserved document as its authors intend can be achieved by augmenting its source version with representation in a lingua franca appropriate to its genre.
· Making a document trustworthy can be achieved by firmly binding evidence, using cryptographic signatures of individuals or enterprises that have little to gain and much to lose by endorsing misrepresentations.
· Essential document relationships that define collections and critical dependencies can be reliably preserved by binding inter- and intra-document links to digital document hash codes.
· Everything that is new and essential[12] can be implemented by straightforward extensions of office document software based on a relatively small number of widely used international standards, such as core portions of XML, character coding standards, cryptography, and the Church-Turing thesis.
A synopsis and table of contents is available on-line. Another Web page provides actionable links to the book’s citations of Web-accessible references.
We call the two methodological components contributed by my colleagues and myself “durable encoding” and “durable evidence”.[13] A durable encoding prototype has been implemented in the National Library of the Netherlands DIAS and is being elaborated in Kopal. Durable evidence methodology is being pursued in ArchiSafe. Both projects project enjoy German Government funding.
A workshop announcement suggests that attention to preserving databases is timely:
Most of scientific research
is now based on digital data resources, and databases are playing an
increasingly important role. Much of the
data is either impossible … to reproduce or can only be recovered at enormous
costs …. Nearly every reference manual,
dictionary and gazetteer benefits from some form of database management
support, … The need for preservation is self-evident.
While considerable
thought has been given to the preservation of fixed "digital objects"
studied in the past, the preservation of databases, which have an internal
structure and which may change over time, poses new challenges. …
Libraries, the traditional curators of … reference material, have
largely abrogated their archival responsibility to databases. Database preservation raises new technical,
economic and legal issues.
This announcement continues by posing 13 questions (reproduced below). Addressing even these questions thoroughly seems beyond what a one-day workshop can accomplish. The workshop purpose might be advanced by identifying what is already known about database preservation. On the other hand, an unasked question seems worthwhile: Where should we look for ideas and for software for preserving databases?
To keep what follows brief, we limit its scope and style. As in the workshop announcement, what follows assumes that how to preserve static files is sufficiently understood. It further assumes that, if a file can be reliably and reversibly converted to a file format that we know how to preserve, we can preserve that file by converting it to the understood format and saving that together with the reverse conversion rule.
Metadata are as important for databases as they are for other preserved objects, but are not discussed below because database metadata present the same challenges as other metadata. Finally, this article does not attempt to justify what it sketches.[14]
The term ‘database’ has many meanings, including “a set of files with internal structure conforming to a well-known schema class, such as that of a relational database, and to prescriptions of a database management system (DBMS), such as IBM’s DB2.” Another definition is “any data collection that has a relatively simple and orderly structure.” Various meanings are discussed in a Wikipedia article that also describes structure models. Of the meanings for ‘database’, what follows assumes the kind of relational database managed by IBM’s DB2 except where some other database type is explicitly mentioned.[15]
Relational databases (RDBs) have mostly displaced hierarchical and network databases. Instances of the latter models can be converted to relational form. Other providers’ relational databases can be converted to work with IBM’s DB2.[16] In fact, any structural information can be represented relationally.[17] Accomplishing such conversions does make explicit a challenge for any data whatsoever—distinguishing essential information from accidental information, i.e., separating authors’ intentions from irrelevancies of their written works.[18]
What differences between RDBs and ordinary files are important for preservation methodology? I can think of only three that are critical: (1) databases tend to contain little implicit or explicit contextual information; (2) RDBs are dynamic and (3) often much bigger than the biggest ordinary files—sometimes as much as 1,000 times larger, or even more.
By “dynamic”, what is meant is that a database might have to be accessible for change even when someone wants to undertake preservation actions, with many changes occurring in any time interval similar to that required to make a database copy.
The size challenge is that it can be impractical to use digital networks to copy an entire database from one location (computing environment) to another—too costly and too slow.
The dynamic and size challenges are handled by commercial DBMSs suitably for preservation copying. The “dynamic” challenge is handled by snapshot and logging functionalities, such as those that IBM’s DB2 has included for more than 20 years.[19] A snapshot and a log can be combined to create a representation of the database state at any time between the snapshot execution and the time that logging ended. Reconstituting a RDB from a snapshot and logs will be supported in commercial DBMSs for the foreseeable future. New software is not needed.
Compared to scientific/cultural data collections, operational databases tend to be very large, to change rapidly over longer periods, and to have readily justified values to their owning enterprises. Critical commercial databases are replicated remotely approximately continuously.[20] For scientific/cultural databases as large as a few terabytes, a practical method of replication is by Sneakernet—parcel post of external storage devices.[21]
With this background, we can suggest answers to the questions asked by the DB workshop organizers. From a software engineering perspective, answers to the technical aspects are readily available in commercial DBMSs. If the workshop would take into account of what is already known and in practice, its focus could shift to working out economical procedures packaged for cultural sector repository staffs.[22]
What are the salient features of a database that should be preserved? Snapshots and logs managed as described in DBMS documentation are sufficient to preserve any portion of a relational database. The necessary snapshot and logging software support is part of any competent DBMS.[23] The choice of database portion to preserve and related external information to provide context are subjective decisions similar to the choices for any document collection.
What are the different stages in the database preservation's
life cycle? All times in the life of a DB in use are
equivalent, except that the DB content is changing. With snapshots and
logs, any desired state in the history of a DB can be preserved for later
inspection.
How do we keep archived databases readable and usable in the
long term (at acceptable cost)? The formats of individual DB fields usually
conform to standards (e.g., for floating point numbers), because that is part
of "industrial strength" DBMS support. Given that, the
snapshots/logs mentioned above are sufficient.
How do we separate the data from a specific database
management environment? A motivation for Ted Codd's 1970 invention
of relational database[24] was to make the data independent of DBMS
implementation details. A RDB is portable from the environment provided
by one DBMS provider into that of any other DBMS provider.
How can we preserve the original data semantics and
structure? The structure (viz., the table, column, and
field definitions) of an RDB is described in its system catalog tables, which
are themselves part of this RDB. The structure of these administrative
tables is described in textbooks.[25] As to semantics, it depends what one
means by semantics. This might be (1)
how the DB responds to SQL queries and update actions, or (2) the relationship
of DB content to real-world facts. (1) is fully defined by the DB
structure combined with SQL functionality. Handling (2) is much the same
as for the content of any book whatsoever, except that books tend to include or
imply much more context than a typical relational database does. For this reason, it is likely to be helpful
to preserve a document collection to describe the database and its connections
with the world.[26]
How can we preserve data while it continues to evolve? Combine snapshots with DB logs.
How can we have efficient preservation frameworks, while
retaining the ability to query different database versions? Reconstitution of a prior DB state from a snapshot and logs creates a DB
representation that can be queried just as the original data were queried. Doing this would be efficient in the sense
that it requires no new software and no fresh user education.
How can multi-user online access be provided to hundreds of
archived databases containing terabytes of data? Presumably the challenge alluded to is that academic repositories
typically do not have the powerful computers needed to serve large databases,
or that the value (to some constituency) of preserving big databases is not
sufficient to justify the implied cost.
Can we move from a centralized model to a distributed,
redundant model of database preservation? This is primarily an issue of the cost and management of powerful
computers and networks. See the discussion
of Sneakernets above.
What documentation is preserved together with a database,
and in what format?
Those attributes of an
RDB that distinguish it from other RDBs are preserved in its system catalog tables,
which themselves are an RDB. Everything that one needs to know about the
latter is defined in textbooks. As
already mentioned, a preservationist needs to preserve an accompanying document
collection to provide context.
What are the legal encumbrances on database preservation? Such encumbrances are qualitatively similar to those for any other kind
of intellectual property.
What can be learned from traditional archival appraisal for
the selection of databases for preservation?
Selection is
always a subjective decision of someone, or of some institution, deciding how
to spend resources. This is the case for DBs in the same sense as it is
for books, with only costs and values different case by case.
To what extent can the preservation strategies, and
procedural policies developed by archivists be adapted for databases? This question is insufficiently defined for answers to be apparent.
Where should we look for ideas and for
software for preserving databases? Private sector database technology and deployment
seem to be a decade ahead of what is discussed in academic sources apart from
Computer Science departments. For
instance, Sun Microsystems is offering an immense Sneakernet implementation—an entire data center on wheels. And seeMore’s Virtual Database Server
(sVDBS) seems to include most of what might be needed to implement database
preservation.[27] The
cited InfoWorld review calls this offering:
A brilliant tool that will enable a
large enterprise to gather its far-flung databases—regardless of their
origins—under a single, relational roof.
sVDBS seeks to provide access to just about any data source, even flat
text files and highly-structured COBOL databases, through standard ODBC, JDBC,
or OLEDB interfaces. It succeeds.
Unfortunately those planning the workshop seem to be unaware of the private sector solutions responding to their technical questions. Perhaps this is more evidence of the "two cultures" division observed in earlier DDQ numbers. For repository institutions, obtaining adequate resource questions for installation, operation, and maintenance of the available commercial database solutions might be the biggest challenge.
Competition among private sector information search services is has for several years been vigorous. In contrast, academic librarians have long emphasized cooperation over competition. Two recent articles underline a growing opinion that the health and influence of academic libraries might depend on their learning how to compete for reputation as information sources.
Perhaps none of Wright’s Top
100 Alternative Search Engines is as comprehensive as
Google. However many are superior in the
specific areas in which they search.
Standouts include GoshMe.com to help find the
best search engines for a query, TheFind.Com for shopping, and Hakia.Com
for attempting meaning match.
An Inquiry into Academic Cataloging Practice
Markey’s The Online Library Catalog: Paradise Lost and Paradise Regained? discusses academic libraries’ relative decline as the primary information source for students, with suggestions how these institutions might regain their status. Its distinction between the needs of domain experts and those of students is particularly apt:
Domain experts—scholars, scientists, and experienced researchers … —know the unanswered research questions, sticky controversies, and active scholars in their discipline. Rarely, if ever, do they need to conduct the brute-force subject searches that characterize the searches of domain novices … When they are stumped, their standing in the field gives them carte blanche to contact the world's experts to get answers to questions … . Primary sources are truly the intellectual playground of domain experts: they use primary sources to make new discoveries, and the by-products of their research are the creation of new primary sources.
Most people are domain novices about their topics of interest. Undergraduate students especially are just beginning to learn the summary knowledge of a discipline. They have no depth, do not know the discipline's influential authors, important questions, cutting-edge research, or research methodologies. Building a catalog of the future that is biased toward primary sources does not serve the interests of domain novices.
Libraries Playing Catch-Up: Melvyl Recommender Project
This project explored closing the gap between features that library patrons have come to expect from information retrieval systems and what libraries today deliver now. It studied relevance ranking, auto-correction, text-based discovery, user interfaces, and recommending. A December paper looked into generating recommendations by item linkages ("patrons who checked this out also checked out...") and by the content of bibliographic records to develop queries for similar items ("more like this..."). The investigators found that users wanted recommendations to support their academic work, faculty, bibliographic and footnote recommendation sources, and recommendations to be effective for query expansion.
The second project phase emphasized adding full-text indices to a metadata-only index. Findings included that:
· The text-based discovery application, the eXtensible Text Framework (XTF) that was the backbone of the project's system proved capable of scaling to millions of records and hundreds of concurrent users.
· An index based single word spelling correction algorithm addressed 90 percent of misspelled single words.
· Faceted browsing and FRBR-like document groups can substantially improve patrons' work with large result sets.
· Keyword searching, document scoring, and index-based spelling correction can effectively combine full-text and metadata records into one system.
Much of the functionality explored can be found in the XTF prototype and at the CDL website.
Protesting
the “Law of the Excluded Middle”
I have long been
puzzled by the law of the excluded middle.[28] Any
proposition is, by definition, either true or false, with no “in between”
values. Given this and the development
of modern logic, it is unnecessary to say anything about an excluded
middle. Doing so seems to me to violate
the Occam’s Razor admonition for
economy of explanations. Saying it
otherwise, my difficulty is that the law of the excluded middle does not seem
to conform to what I understand is the role of a scientific law (or the roughly
equivalent mathematics entity, a theorem).
As I understand it,
the purpose of a scientific law or a mathematical theorem is concise expression
of similarities among facts that might otherwise seem unconnected, partly to
enable prediction of facts that have not yet been seen. For instance, these two roles are at the core
of the discussion of Maxwell’s laws of electromagnetism in the cited
article about magnetic monopoles. Similarly the Pythagorean theorem can be used
to predict the hypoteneuse length of any right triangle. How does the law of
the excluded middle conform to such an understanding of what we mean by “law”?
Most of our
discourse is relatively casual, using natural language, rather than carefully
adhering to the precision of formal logical languages. However, when someone alludes to the law of
the excluded middle, he is likely to be signaling a shift to formal precision. Given this, what follows in the current note
has to do with attempts at precise communication.
The simplest
non-trivial range for the values of (mathematical) functions is the domain of
two values (the binary domain), commonly denoted either by the set {0, 1} or
the set {true, false}.[29]
Both in natural language and in the most prominent logical language, we
often choose to use sentences (expressions of fact) whose values are
limited to the binary domain. Such
sentences are commonly called “propositions”.
After we signal a listener that our statements are propositions, an
assertion of the law of the excluded middle adds no information whatsoever.
In natural
language, propositions are much used precisely because they evaluate into the
simplest non-trivial range. But their
information carrying power is so inefficient that other styles of assertion are
common. For instance, consider the difference
between “the temperature is below 30°F” and “it is cold”. The former
evaluates either to “true” or to “false”.[30] The
latter evaluates into a continuous numerical range; its value depends on the probability
with which the speaker might evaluate the current temperature as belonging to
“cold”; the modern description of such sentences belongs to fuzzy logic.
In technology,
propositions are the language for individual logic gates and memory cells. Such devices are made useful by arranging
them in arrays whose values are binary sequences, such as “1010100011110101’
for a 16-bit data bus. Instead of
talking about the value conveyed by each line of such a bus, we talk of the
value conveyed by the entire bus because doing so is relatively efficient. By talking in terms of 16-bit numbers, a
range of 65,536 values, we express ourselves concisely.
Each example above
illustrates that, when we want to be precise, we choose language carefully. The notion of “excluded middle” has no place
for sentences that evaluate outside the {true, false} range. And it expresses nothing not already said for
sentences belonging to binary symbolic logic.
The four prior paragraphs
depend on symbolic logic and a recent symbolic logic addition, fuzzy
logic. Such formal languages were
invented in the late 19th century and in the 20th
century. In earlier times with only
natural language available, it might have been reasonable to talk of the law of
the excluded middle because the notion of a proposition was
ill-understood. After the work of Frege,
Russell, later scientific philosophers, and computer scientists, formal logic
has become widely used. Today, when someone
says that the law of the excluded middle applies, he is merely saying that what
he chooses to utter is a proposition.[31] To
appeal to “the law of the excluded middle” is inappropriate in modern
discourse!
A recent Physics Today article[32] provides the most readable explanation I have encountered for the confluence of high energy physics, theory, and cosmology challenges. It makes arcane matters unusually comprehensible.
Already in 1931, Paul Dirac showed that the existence of a single monopole would suffice to explain the universal quantization of electric charge. The article’s depiction of Maxwell’s equations, that explain most of electricity and magnetism, shows that addition of magnetic monopoles would add elegant symmetry.
Some readers will find the article’s first page provides as much information as they want.
Kaufmann’s Discovering the Mind: Goethe, Kant, and Hegel
Kaufmann[33] provides an eye-opener—Goethe as a
humanistic philosopher—and a rare critical view of Kant, illustrated by the following
excerpt.[34]
Kant's misguided transcendental method always begins with what he wants to accept as absolutely certain—Euclidean geometry, Newtonian science, the categorical imperative, the notion that by virtue of their reason all human beings have a unique dignity—and then he asks what must be the case for these things to be absolutely certain. Nietzsche had a point when he said in The Gay Science (Section 193): "Kant wanted to prove, in a way that would dumbfound the common man, that the common man was right." To put the point still more concisely: Kant was a virtuoso of rationalization.
What makes him so unusual is that by the time he published his magnum opus in 1781 he had a comprehensive vision in which everything fell into place. As a corrective of a charge often raised against Nietzsche and of the prejudice that density and opaqueness bespeak rigor, I appreciate Robert Paul Wolff's claim …: "Of all the great philosophers, there is none so rich in insights and so plagued by inconsistency as Kant."
Steven Levy’s Newsweek article, Invasion of the Web Amateurs, is
a timely caution against undiscriminating belief in what one reads in Wikipedia,
and more generally on the Web.[35]
[1] H.M.
Gladney and J.L. Bennett, What Do We Mean by Authentic? D-Lib Magazine
9(7), July 2003.
[1] I have abbreviated Erickson’s e-mail.
[2] Jim Gemmell, Gordon Bell and Roger Lueder, MyLifeBits: a personal database for everything, Comm. ACM, 49(1), 2006, 88-95
[3] The following book descriptions are condensed from announcements. I have not yet seen most of these books since nearby libraries have not yet acquired copies.
[4] Marilyn Deegan and Simon Tanner, eds., Digital Preservation, Facet Publishing, 2006. ISBN 978-1-85604-485-1.
[5] Uwe Borghoff et al., Long-Term Preservation of Digital Documents: Principles and Practices, Springer Verlag, 2006. ISBN: 978-3-540-33639-6
[6] Julien Masanès, ed., Web Archiving , Springer Verlag, 2006. ISBN 978-3-540-23338-1
[7] Carlo Batini and Monica Scannapieco, Data
Quality, Concepts, Methodologies and Techniques,
Springer Verlag 2006 ISBN
978-3-540-33172-8
[8] “Federated data” is a digital librarian’s term for data combined from independent sources.
[9] H.M. Gladney, Preserving Digital Information,
Springer Verlag, 2007. ISBN
978-3-540-37886-0
[10] Many of the book’s ideas have appeared in draft form over several years of the Digital Document Quarterly.
[11]
Preserving a file without a cloud of
contextual connections is unlikely to be useful. Insight into why this is so can be acquired
from Ludwig Wittgenstein’s Philosophical Investigations.
[12] H.M. Gladney, Principles for Digital Preservation, Comm. ACM 49(2), 111-116, 2006, summarized the key novel elements.
[13] Durable encoding is described in the book’s Chapter 13, and earlier in H.M. Gladney and R.A. Lorie, Trustworthy 100-Year Digital Objects: Durable Encoding for When It's Too Late to Ask, ACM Trans. Office Information Systems 23(3), 299-324, July 2005.
Durable evidence is described in the book’s Chapter 12, and earlier in H.M. Gladney, Trustworthy 100-Year Digital Objects: Evidence After Every Witness is Dead, ACM Trans. Office Information Systems 22(3), 406-436, July 2004.
[14] I believe that what follows will be mostly self-evident. However, if skeptical readers choose to raise questions or objections that can be answered within the typical length of DDQ commentary, I will be happy to address these promptly and publicly.
[15] The discussion would not be much changed by any other choice of a commercially supported database type. It simply happens that I understand DB2 and SQL better than I understand any other widely used offering.
[16] This is not a suggestion that such conversion be used in a practical preservation implementation, but instead is a preservation feasibility argument.
The answers presume only the functionality of the earliest versions of IBM's DB2. Later versions have more functionality, of which little is pertinent to thinking about preservation.)
[17]
Insight into the fundamental importance
of relations for representing structure is discussed in Rudolf Carnap, The Logical Structure of the World, U. Chicago Press, 1967. This was originally published in 1928 as Der Logische Aufbau der Welt.
[18] See §4.1 of Preserving Digital Information, described above.
[19]
A database snapshot is a read-only copy
of a database that reflects all database data up to the point in time for which
the snapshot is taken. For instance, see
Bo Kähler and Oddvar Risnes, Extended Logging for Database Snapshot Refresh, Proceedings of the 13th VLDB Conference, 389-398,
1987. Snapshot services were added to Microsoft’s SQL Server offering
in 2005.
[20] To maintain a remote copy is approximately as expensive as maintaining the primary database. This is affordable for commercial databases, e.g., if losing the database would risk losing the entire company, but might be impractical for a scientific or cultural database.
[21]
According to BBC News, Google
is experimenting with 120 terabyte Sneakernet to help academics around the world exchange huge amounts of data.
[22] An early version of my answers to these questions was sent to the workshop organizers at the end of January.
Discussing practical procedures would require a longer article than DDQ permits.
[23] Here a "competent DBMS" is a computer application that has at least the snapshot and logging functionality of early versions of IBM's DB2.
[24] E.F. Codd, A Relational Model of Data for Large Shared Data Banks,
Comm. ACM 13(6), 377-387, 1970.
[25] For instance, Don Chamberlin, Using the New DB2, 1996. ISBN 1-55860-373-5
[26]
The underlying epistemological issue is
that language is not grounded. See
Ludwig Wittgenstein, Philosophical Investigations: The English Text of the Third Edition, Blackwell, 1958.
[27] This opinion is based on a 12th March 2007 review. Although the $20,000 list price for sVDBS might seem high to cultural institutions, it is low compared to the cost of any R&D project that might provide part of its pertinent functionality. However, this DDQ comment should not be construed as a recommendation of sVDBS over other software, but rather as a recommendation that commercial software should be investigated before any large investment into R&D for database preservation.
[28] Many articles in the Stanford Encyclopedia of Philosophy mention the law of the excluded middle.
[29] The empty domain, {}, corresponds to undefined functions. Any single-valued domain, {constant}, is the output of a constant function. While both of these are important, they are in some sense “uninteresting”.
[30] In what follows, I ignore the possibility that the temperature is exactly 30°F, because the probability for this is vanishingly small. The matter is not so simple for “exactly 32°F” because this number is set as the freezing point of pure water, which can be observed with a thermal bath.
[31] A reader whose opinions I value highly has disagreed with this conclusion and the preceding arguments, without yet identifying how my analysis might be mistaken. In view of the great antiquity of the Law of the Excluded Middle, without prior challenge that I am aware of, I am not completely confident doubt about my thinking. However, absent a specific and persuasive challenge, my expressed opinion will persist. Perhaps some DDQ reader can provide a counterargument.
[32] Bertram Schwarzschild, Search for magnetic monopoles at the Tevatron sets new upper limit on their production, Physics Today 59(7), 16-18, 2006.
[33] Walter Arnold Kaufmann, Discovering the Mind: Goethe, Kant, and Hegel, McGraw Hill, 1980. ISBN 0-07-033311-4
[34] Ibid, page 116.
[35]
DDQ frequently links to Wikipedia
pages. However it does so only after I
have decided that each citation is substantially correct and well written. For epistemology, I prefer to cite the Stanford Encyclopedia of Philosophy.