Digital Document Quarterly

Perspectives on Trustworthy Information

Prospects

 

HMG Consulting

20044 Glen Brae Drive

Saratoga, CA 95070

(408) 867-5454

©  2002, H.M. Gladney

 

 

 

In the industrial nations, nearly every business, government, and academic document starts in a digital form, even if it is eventually published and saved on paper.  The Digital Document Quarterly (DDQ) will treat quality for digital documents—books, newspapers, scholarly papers, scientific tables, legal briefs, medical charts, engineering designs, and government and business records—the carriers for many kinds of information.  Our topic touches the welfare of every citizen.

DDQ will focus, at least during 2002, on the problems that “trust”, “trustworthy”, and “trusted” imply and on trustworthy digital mechanisms—and their limitations.  Can you trust e-mail or what you read on the World Wide Web?  How can you persuade people to trust what you send?

For what kind of reader is this newsletter intended?  How will it be different? 

Our target reader is the university graduate who reads and writes extensively, and who wants to understand and perhaps influence the “digital revolution”.  Hopefully, DDQ will appeal to liberal arts and social science graduates—people who are not engineers and scientists—who want to understand policies that affect their professional and personal lives, but do not want to study the technological intricacies and jargon.  

Not everyone will find DDQ easy.  This should be no surprise, because the issues are profound.   If they were easy, it would not be taking years for many brilliant people to agree on the best methods.

“Getting it right” necessarily involves precision in language and in action.  Whenever it is difficult to express an idea both simply and correctly, DDQ will favor precision over simplicity, taking the view that   simple does not justify simplistic.  DDQ is for intelligent readers who can think about technical policy, but who have neither time nor inclination to study digital engineering.

What Does DDQ Mean by “Quality”?

Quality is meaningful only with respect to what clients and customers want.

In DDQ, “quality” is not meant to be the values created by authors and artists.  Instead, it stands for such properties as “integrity”, “authenticity”, “confidentiality”, and related attributes that can be enhanced by technical methods.[1]  Information about document quality is data to enable appropriate security, privacy, ownership rights and privileges, survival in spite of hazards that include technical obsolescence, and deletion if/when business and legal circumstances favor this.  Audit trails and other mechanisms are needed to ensure correct binding to metadata,[2] to make documents robust against transformations (such as picture cropping that can destroy subliminal watermarks), and to attach evidence of ownership firmly.

Charts from a recent presentation summarize the topics planned for DDQ. 

Digital quality attributes are similar to and mostly derived from their counterparts for physical artifacts, including paper, but how they are best accomplished in a digital environment is often different.  In a world in which information is a valuable good, the demand for quality might increase over what it has been. 

Nothing in this is either unexpected or difficult.  Practical measures are simply applications of thorough engineering.  Arguably, however, this has not received sufficient attention.[3]  Good engineering demands that the customers’ quality objectives be explicit, written, and available when a product is first designed.

How Will DDQ Differ from Other Writings?

What might DDQ offer to people already confronted with information overload? 

DDQ will try to help technical lay-folk who want to understand the issues without wading through the technical literature.  Articles about digital documents are abundant.  DDQ will provide filtering to identify important articles, synopses comprehensible by non-technical readers, and opinions on the content.  In short, it will make difficult literature accessible and save the reader time.  The questions are: what is known?  What challenges can be solved only by invention?  What gaps can be filled by engineering initiatives already underway or readily mounted?

Little written about our topic is both trustworthy and comprehensible outside the involved professions.  The existing writings include:

Articles written by scientists and engineers for their peers

These report novel experiments, mechanisms, and ideas and are usually subject to demanding prepublication criticism.  They mostly assume the reader understands prior work and repeat very little of that.  They use jargon.  A reader might need many years practice just to decide quickly whether a specific article is worth his time and effort.

Articles by librarians and archivists grappling with change

These are many and distressingly repetitive, making it tedious to ferret out what might be new and insightful.  Regarding critical technical aspects, this literature is shallow and often misleading. 

Press articles and product critiques

These are worth following if one has sufficient grounding to judge quality.  The best are insightful, but this literature cannot be expected to teach how to judge information sources for use in important decisions.

Promotional articles from technology providers

These are worth following if one can separate hyperbole from reliable facts, and can also detect what’s missing.  For instance, they do little to distinguish between “leading edge” and “bleeding edge” products.

Software tool descriptions

Even professionals find it difficult to judge software tools from their published specifications.  To decide on the virtues and weaknesses of any offering, it’s often necessary to install and use advertised software.

DDQ will avoid explanation that most readers won’t want.  This comes at a price.  DDQ will make assertions that it does not justify.  Sometimes a reader’s justifiable skepticism might stimulate a challenge, “Prove to me that … is reasonable!”  DDQ will attempt to head off such challenges by citing the best literature.  The reader who chooses to follow a citation might encounter technical complexity.  If several readers express difficulty with such articles, I’ll try to help.

DDQ will be complemented by a comprehensive glossary and later by a cumulative bibliography.  Most DDQ numbers will also include tips to help readers manage documents on their personal computers.

How well will all this work?  The only way to find out is to try it.  How well DDQ will accomplish its objectives will depend partly on what feedback and questions readers send.  If I receive little, this would be a sign that DDQ is not very helpful.  If I get none, I’ll probably conclude that DDQ is of little value, and quit producing it.  I.e., the responsibility is partly yours.

Topics

The first DDQ number starts to deal with long-range preservation of digital information and emphasizes background without which later topics might be difficult to understand or appear ill-founded.  The next number will continue with long-term preservation, not because it is more important than other top­ics, but rather because digital preservation seems timely in 2002.  To prepare for that, the first number tries to convey: why is digital preservation important and suddenly urgent?  What kinds of challenge need to be addressed?  Among these challenges, what are the technical components?  What is wrong with the proposed direction of certain prominent institutions?

Beyond that, DDQ plans are flexible, except that the second number is unlikely to cover digital preservation as thoroughly as this topic deserves.  Since cryptography is a foundation of many information security tools, its essentials will be sketched as soon as DDQ space is available after more urgent topics are discussed, such as the problems associated with personal privacy and identity management.

Trust, Language, and Philosophy

Every discussion of digital document quality depends on the meaning of language symbols.  Digital information is a symbolic representation of something other than itself.  Misunderstanding contributes sig­nificantly to failures of trust and security—perhaps more than deliberate falsification.  Furthermore, the boundary between what can be mechanized and what must forever remain a human value decision or judgment is limited to the facts that language can convey. The pertinent limitations of language were worked out in the first half of the 20th century by philosophers and mathematicians: Bertrand Russell, Ludwig Wittgenstein, Rudolf Carnap, Kurt Gödel, Alan Turing, and their students. 

Wittgenstein's thinking is particularly important, and will figure throughout DDQ.  His teachings on the limits of language will help us distinguish between what’s trustworthy and what‘s not.  It will also be used to explore whether currently evolving work variously labeled knowledge management, semantic network, ontologies and RDF (Resource Description Framework) is not only sound, but is also theoretically complete.  We need to know this to choose digital archiving schema that can convey meaning.

Computers manipulate symbols that are surrogates for what they mean.  A computer model is good if its pattern follows the pattern of what it stands for.[4]  Wittgenstein’s ideas[5] illustrate with example after example the relationship of language to meaning, teaching that language consists of symbols taking meaning from how they used.

One way of seeing Wittgenstein’s 1939 Lectures on the Foundations of Mathematics is that they teach the limits of what language can express, including the limits of information management.  Wittgenstein’s seminal work, the Tractatus Logico-Philosophicus,[6] can be interpreted as teaching the boundary between natural philosophy and ethics, which is much the same as the boundary between facts and values.[7]  Only facts can be spoken; values must be shown.[8]

Although these paragraphs reduce difficult philosophy to a mere fourteen sentences, you might impatiently think, “What has this got to do with digital documents?”  In DDQ, you will see it helping with many topics, starting with trust.

Why Philosophy is Essential in DDQ

Perceptive readers will see, as DDQ unfolds, that early 20th-century philosophy helps not only with questions of trust, such as critical distinctions between “trustworthy” and “trusted”, but with many DDQ topics.  For instance, good system design for digital preservation will automate everything that can be automated, and will not attempt to automate beyond that.  The essential distinction is the distinction between what machines can in theory do and what must be left to human judgment.  Wittgenstein teaches that boundary better than any prior thinker.

This reason for recalling philosophical writings may be of interest only to system engineers, but may be of little interest to service clients.  There is a better reason for invoking philosophical thinking—a reason partly based on users’ needing to choose services that they trust.  Ultimately, DDQ readers as service clients and customers are responsible, along the lines of, “If you choose an inferior tool, don’t complain to me that your data was damaged!”  How can you (or someone you trust) decide objectively how trustworthy a tool is?

DDQ will treat controversial topics and often argue that the most popular approach is either too expensive or not trustworthy, or both.  Should you accept DDQ’s position even if it is persuasively argued?  Of course not, at least not for this reason alone!  The extent that your necessary judgments can be objectively based, and how best to make such judgments, is informed by very careful thinking in over 150 years of philosophical writings.  DDQ avoids the unconvincing, “Trust me!” by relating specific technical proposals to the soundest foundations available.  Frequently these are found in Wittgenstein.

 

Acknowledgements

What DDQ expresses, and how it is presented, depends heavily on careful reading of drafts by John Bennett, Tom Gladney, Peter Lucas, and John Swinden.  Each provided not only oral discussion, but also written critiques that included constructive suggestions used in rewriting.  I’m grateful for these and the many hours of effort they represent.  Of course, the responsibility for what finally appears is entirely mine.

 

 



[1]        I use “quality” for lack of an English word that conveys just what I want and no more; perhaps some reader will suggest a better word; I certainly hope so.

[2]        Metadata is document information that is considered not part of the document itself, but is often essential for correct management of the document.  Metadata is mostly added by people other than each document’s authors.

[3]        A current example is Microsoft’s 2002 announcements that it will focus on security in its products.  This pronouncement is ridiculed in the trade and business press because security is widely judged not to be a retrofit feature, but rather a property of the core software design.  I.e., you cannot add security; you must build it in from the beginning.

[4]        Jargon for this includes sentences like, “Information is represented by an XML document.”  A model that faithfully represents what’s intended is said to be isomorphic to the facts.

[5]        Wittgenstein’s own writings are few and short, but this is compensated by the works of his disciples and interpreters.

[6]        The Tractatus [LW 21] is a difficult read.  If you want to read the real McCoy, start with the 1939 lectures. [LW 39]

[7]        The phrase ”natural philosophy”, common usage in the 19th century and earlier, is appropriate because it includes all branches of science and engineering, avoiding distracting specialization that is an artifact of modern university organization.

[8]        Wittgenstein would approve the Biblical.  “Let your light so shine before men, so that they see your good works, and …”