|
Digital Document Quarterly Perspectives on Trustworthy
Information |
Volume 2, Number 4, 4Q2003 |
|
|
|
|
|||
|
|
HMG
Consulting 20044
Glen Brae Drive Saratoga, CA 95070 |
©
2003, H.M. Gladney ISSN: 1547-8610 |
Riddles can be instructive. If you agree, you might try the following before you look at the answers given later in this DDQ number.
In each case below, the assignment is to provide further members of the set or sequence given.
1) O, T, T, F, F, S, S, E, ...
2) 79, 72, 66, 59, 50, 42, 34, 28, 23, 14, ...
3) cherry, apple, rhubarb, plum, beet, Japanese maple, ...
4) 3, 7, 11, 15, 19, ...
Digital preservation is critical to most of the history of the future.[1] This expectation justifies every practical effort to ensure that the technical and administrative methodology used is sound and widely understood.
To examine the criteria that
should be used, we have returned to early 20th-century
epistemology. The thinking of Ludwig
Wittgenstein and his successors teaches the importance of sharing definitions
that are precise enough to minimize community confusion. It further teaches that we should pay
diligent attention to the boundary
between what can be specified precisely—what’s objective, and automated, and
what must forever remain issues of human values, opinions, and imperfectly
communicated intentions—what’s subjective, and therefore cannot be automated.
Such philosophic distinctions are critical to issues of trust and of preserved information authenticity. The latest member of our Trustworthy 100-Year Digital Objects series identifies essential criteria for any approach to long term digital preservation. Subtitled Syntax and Semantics—Tension between Facts and Values, its abstract reads:
Prior Trustworthy 100-Year Digital Object articles describe a method for preserving digitally represented information. Trustworthy Digital Object (TDO) representation and packaging makes any digital content reliably meaningful to consumers, no matter how distant these are in time, in space, and in social affiliation from their information sources. The current article focuses on digital document authenticity and on evidence a consumer can use to decide whether to trust the content.
Such considerations are necessarily epistemological. Arguing the issues must start by conveying as unambiguously as possible what we mean by words like ‘authenticity’ and ‘evidence’ and by distinguishing between such words as ‘objective’ and ‘subjective’. These arguments apply Wittgenstein’s teaching to pictorial models of digital and conventional communication.
The analysis leads us to identify an ethical imperative for digital preservation, and to suggest that the TDO method defines a quality standard against which any method of digital preservation should be judged.
A preprint copy is available on e-mail request. I would welcome critical commentary.
Trouble-free transfer of textual information across otherwise incompatible digital platforms depends on proper handling of character representation. This topic has become surprisingly complicated, partly because incomplete protocols were widely used before a standard became available. (Declaring a fresh start that discarded old data and obsolete tools would be impractical.)
Conceptually, the topic is relatively simple. However, since for many years I did not understand it clearly, I presume that some readers might welcome a brief explanation.[2]
Unicode/UCS |
is a function from natural numbers in [0,231-1]
(31-bit integers) to character names. |
|
UTF-8 |
is the most popular of several ways of
representing Unicode text to take less storage space than would be required
if characters were represented by a 32-bit words (4 bytes). |
|
A glyph |
is a picture for displaying and/or printing a
visual representation of a character. |
|
A font |
is a set of glyphs for some Unicode subset, with
stylistic commonalities in order to achieve a pleasing appearance when many
glyphs combine to represent a text. [3] |
|
A code point |
is the number or index that uniquely identifies a
Unicode character. |
The meaning of a character and the picture of
the character are distinct. Generally
there are many pictures that mean the same character—one for every font
variation. For instance, "the
first letter in the Latin alphabet" (a meaning) can be depicted by any
of
.
In a digital machine, the encoding is a string of zeros and ones (or on and off indications, or true and false indications—different ways of saying the same thing). Such a string can be viewed to be the binary encoding of a number; for instance, ‘10100’binary is the same number as ‘20’decimal.
Unicode characters are intended to represent the written forms of all the world’s major languages.[4] ‘Unicode’ is an informal name for the ISO 10646 international standard that defines the ‘Universal Character Set (UCS)’. Relative to all other character standards, UCS is a superset[5] that guarantees round-trip compatibility. (No information will be lost if you convert any text string to UCS and then back to the original encoding.) Key important Unicode and character representation concepts are illustrated by:
|
Unicode |
Unicode |
Storage Representation |
Human Presentation |
|
002D |
HYPHEN-MINUS |
000101100 |
- |
|
2010 |
HYPHEN |
11100010 10000000 10010000 |
- |
|
2013 |
en
dash |
11100010 10000000 10010011 |
- |
|
2212 |
MINUS |
11100010 10001000 10010010 |
- |
|
00E9 |
LATIN
SMALL LETTER E |
01101001 |
é é é é |
|
01A9 |
latin capital letter esh |
11000110
10100101 |
Σ |
|
03A3 |
GREEK CAPITAL LETTER SIGMA |
11001110
10100011 |
Σ
Σ
Σ Σ
Σ |
|
2211 |
n-ary summation |
11100010 10001000 10010001 |
Σ
Σ Σ Σ |
|
0633 |
ARABIC LETTER SEEN |
11011000
10110011 |
س |
Epistemological analysis of Unicode and of fonts: Unicode defines a function from code points (integers) to names (ASCII character strings), saying nothing about how the integers should be represented by binary code or about how the characters should be depicted by glyphs or sound when spoken.[7] Unicode character names are surrogates for conceptual objects. They are also mnemonics by virtue of being well-known English words. How can the Unicode definition, in itself, be useful?
A character takes its meaning from how it is used, not from the appearance of any associated picture (glyph). For instance, a ‘PARENTHESIS, LEFT’ signals the start of a delimited string. Provided that a glyph used in formatted text is understood to mean ‘PARENTHESIS, LEFT’, it is almost irrelevant whether it looks like ‘(‘, ‘(‘, or ‘(‘.
The identity of characters is defined by the first two columns illustrated in
the table above, and different characters might, in some fonts, have identical
glyphs. For instance, ASCII contains
characters with multiple uses; its ‘hyphen’
is used also for ‘minus’ and as a ‘dash’.
In contrast, Unicode defines ‘hyphen’
and ‘minus’ (as well as different
dash characters). For compatibility,
the old ASCII character is preserved in Unicode also (in the old code position,
with the name ‘hyphen-minus’).
Why might a distinction between ‘hyphen’
and ‘minus’ be important if their
glyphs are identical in many fonts?
Although the distinction might be unimportant for print and display
appearance, it is almost surely critical in programs such as those that include
sorting and searching. For instance,
when I search for minus signs, I prefer not to be distracted by hyphens.
When a text file is sent to an application, how does it know what character coding[8] is being used? The answer is that applications that support Unicode typically require that their input files have header records that identify the encoding. For instance, a proper XML header record is:
<?xml version="1.0" encoding="utf-8"?>
Invest to Save, Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation is now available in PDF format from the Delos website. It recommends[9] research into:
|
Emerging Research |
1A |
Repository
development for existing models, for repositories for software and file
format specifications, and management of peripheral devices. |
|
|
1B |
Cheap, long-lasting, efficient and verifiable storage media |
|
|
1C |
Generic devices capable of reading diverse classes of media |
|
|
1D |
Identify how their emergence will change digital entity
encoding formats |
|
|
1E |
Descriptive language for the performance and behavior of
preserved digital entities |
|
|
1F |
Inquiry
into context sensitivity, risk awareness and proper preservation
behavior. |
|
|
1G |
Accelerated ageing of media, systems and software, for
predicting risks to digital objects |
|
|
1H |
Semantics to represent temporal, procedural and spatial
relationships of digital entities |
|
Re-engineering |
2A |
Modeling digital preservation
processes |
|
|
2B |
Automation
of digital preservation processes |
|
|
2C |
Detecting trustworthiness and
information quality |
|
|
2D |
Scalability
of long-range archives |
|
|
2E |
Characterization
of collection completeness |
|
|
2F |
Distributed and grid storage |
|
Systems |
3A |
Formats of digital entities |
|
|
3B |
Managing complex and dynamic
digital entities |
|
|
3C |
Automated metadata creation |
|
|
3D |
Long-term metadata viability |
|
|
3E |
Multilingual entities and
technology |
|
|
3F |
Impact of preservation strategies on information loss |
|
|
3G |
Repurposing
e-content |
Although I contributed as a member of this workgroup, I could not agree with all its recommendations, and the format of the final report did not include dissenting opinions. Partly for this reason I talked again with an IBM Research expert on the storage device industry.
Its list of 21 recommended topics is too long, with the consequence that less promising topics might divert attention and resources from those that promise rapid, effective, and durable progress. If we give credence to expressions of urgency for digital preservation action that must also conform to reliably sound technological practice,[10] we should avoid the distraction and loss of focus that attention to unpromising topics will surely create.
For instance, the 1B, 1C, and 1D recommendations deal with topics whose reduction to practice would have to be handled by industrial enterprises. Close collaboration across disciplines and across enterprise types would be essential. Such collaboration is not evidenced by the recommendations, which read: [11]
“1B: Archival Media : To bring new classes of technology to bear on the recovery, reconstruction and interpretation of the meaning represented by bitstreams, they need to be encoded in preservation formats and on ‘archival media’. Research into generating cheap, long-lasting, efficient and verifiable media for storing the bitstreams is needed.
”1C: Salvage and Rescue: Preservation strategies depend upon our ability to access storage media over time. While we know that some storage media can have a shelf life of thirty years or more, the devices for reading particular classes of media tend to have much shorter life-spans, often only a couple of years. While a peripheral device repository might help here (see above), generic devices capable of reading diverse classes of media are needed to address peripheral device obsolescence.
“1D: Storage abstractions:
Preservation systems map between the operations that can be done on digital entity
encoding formats and the operations that are supported by storage
repositories. As newer classes of
storage devices are developed research will be necessary to identify how their
emergence will change digital entity encoding formats to take advantage of
content-based addressing and parallel processing of data. … ”
To allocate scarce research grant funds to these topics would be unnecessary and ineffective. As written, they fail to reflect well-known engineering and business facts, such as:
1) Achieving low unit price for a technology depends on finding or creating a large market—an unlikely prospect for a digital storage subsystem specialized for long term retention, unless it also happened to provide competitive storage density and read/write speed. Industrial participants have conducted, and continue to conduct, a sophisticated program seeking optimal combinations of durability, density, speed, and price. Only products of such processes are likely to offer prices that digital preservation programs can afford.
2) Looking for durable storage media as an isolated technical objective makes little sense. High performance solutions invariably require matched media, read-write heads, mechanics, packaging, and microcode. Today’s early-phase cost (for prototypes good enough to attract product managers) of a new storage technology is between $10M and $100M—well beyond what the NSF has typically awarded.[12]
3) Storage devices are typically packaged and sealed against dirt and damage.[13] After they leave their factories, the only non-destructive means of accessing their content is through their electrical connections. These support data stream protocols that are independent of the storage media and almost independent of device characteristics. Modern operating system software hides raw device characteristics from all higher level software.[14]
4) It is easy to copy even large amounts of data from aging devices to their replacements inexpensively with low error rates so that media risks are dwarfed by unrelated preservation risks.[15]
It might be possible to reformulate 1B, 1C, and 1D to avoid such problems. Doing so will require information and skills mostly to be found in industrial R&D laboratories.
The preservation literature often compares the lifetime of practical digital storage to that of paper. Two digital storage media are known to be as durable as paper. The first, single crystal nickel, has been pursued in the Long Now Foundation’s Digital Rosetta Project, which enjoys some NSF funding. The second digital medium as durable as paper is, in fact, paper![16]
Two-dimensional bar codes print technology is available commercially.[17] Such digital paper technology has not been diligently pursued for digital preservation. We wonder, “Why not?”
The sequence or set extensions I had in mind are shown as the tails of:
1) O, T, T, F, F, S, S, E, N, T, E, T, T, …
2) 79, 72, 66, 59, 50, 42, 34, 28, 23, 14, Christopher, Houston, Canal, …
3) cherry, apple, rhubarb, plum, beet, Japanese maple, stop sign.
4) 3, 7, 11, 15, 19, 0, 4, 8, …
By sending these puzzles to 30 friends, I gained confidence in a conjecture that few people will guess such responses, and that some will object that the test is unfair. How might my answers make sense?
The answer (1) is given more often by children than by adults, as the sequence I had in mind was the first letters of the English words for the natural numbers: ‘one’, ‘two’, ‘three’, ‘four’, … This illustrates that the answers to such riddles depend on shared experience and shared context. Perhaps children are more likely than adults to answer this one correctly because they have relatively few possibilities to explore.
Residents of New York City might provide the answer (2) from their common experience, because what I had in mind were subway stations of the 8th and Broadway line. This sequence is finite, unlike that in (1).
What I had in mind for (3) is also finite. However, only one member is required (or allowed) to complete the full set intended. What every object has in common is that it is partly colored red.
Mathematicians are likely to recognize that what I had in mind in puzzle (4) was ‘((3+n)mod 4) with n ranging over the ordered natural numbers’. Readers unfamiliar with modulo arithmetic might recognize that they use it in real life whenever they consider rotation or time-of-day; rotating an object 270° (270 degrees) clockwise leaves it facing the same way as rotating it 90° counterclockwise, and the time-of-day 24 hours from now will be the same as the time-of-day right now. For a digital computer that represents any integer in fixed length memory cell, modulo arithmetic is essential; adding 256 to an integer represented in one byte (an 8-bit binary string) does not change the value represented.
In each of the puzzles, the set members I had in mind shared some attribute: respectively being first letters of certain words, being related places, having a color in common, and conforming to an arithmetical rule. However, a priori shared attributes are not needed to choose some particular set or sequence. If you offered to pay me handsomely to load a truck, but provided no further specification, I would return with a truckload (or set) of objects (members) that you could not have predicted.
What’s going on? Notice that each puzzle answer includes “what I had in mind”. The presentation of each riddle was what Wittgenstein calls ‘ein Bild’—a ‘picture’ or ‘model’.[18] Each symbolizes something I had in mind—some thought or concept. However, to communicate a thought precisely and accurately is difficult. We must be sensitive to this difficulty if we wish to achieve the most economical reliable long-term digital preservation.
The core problem is a mathematical fact: suppose we are told that a set B is a subset of another set A, and are also given a tabulation of all the members of B. What does this tell us about members of A that are not also members of B? Nothing! No incomplete set or sequence contains information about its own continuation.
However, the examples do illustrate that, in a shared social context, one has a good chance of guessing what someone else has in mind. Language works only by virtue of shared social contexts. How careful one must be with social contexts is illustrated in our recent analysis of the word ‘authentic’.[19]
Over the last decade, the digital preservation literature has contained numerous allusions to progress depending on interdisciplinary collaboration.[20] How difficult it might be to achieve collaboration is suggested by behavior observed in a public discussion panel about “Documents in the Digital Culture” some years ago.
The discussion was managed in the style of a debate. At a U-shaped table, four engineers/scientists faced four humanities and social sciences professors; the table head was manned by two moderators and a secretary. Each participant was allowed a 4 minute speech, with speeches alternating from the facing table sides; after that, it was a moderated free-for-all in which audience members (more than 40 present) had as much access as formal participants. For the point of this story, the participants’ silent methodological assumptions seemed more instructive than anything specific they said.
After commenting on the prior speaker’s remarks, each scientist or engineer would say something like, “These matters are too broad and diverse for me to handle comprehensively. Instead, I propose to discuss [a narrow essential aspect that I’ve thought about carefully] as thoroughly as the time allocated me permits.”
What the scientist did not say, probably because it is widely understood as being key to the last century’s success of the technical disciplines, might have been something like, “I’m confident that other workers will treat the many important topics that I do not treat, and that this will eventually include synthesis of the essential pieces.”
During such a speech, observation of the body language of the humanists/social scientists suggested that, the longer the scientist spoke, the more impatient and uncomfortable these participants became.
When it became time for a humanist to speak, he would say something like, “What the prior speaker said is undoubtedly important, but I am unable to comment at the level of detail he reached. However, it strikes me that his topic is closely related to [some other topic] that strongly affects [his topic], and that we cannot make progress without considering them together. Moreover, to discuss these topics comprehensively, we must also deal with [several additional topics] and all the topical interactions.”[21]
During such a second kind of speech, observation of the body language of the engineers and scientists suggested that, the longer the humanist spoke, the more impatient and uncomfortable these participants became.
The charge to the panel had not included a call for conclusions, or for consensus on any topic. That was wise, as conclusions and consensus would have been impossible even if the time available, and the patience of the participants, had been much greater than they in fact were.
Are we going to have similar clashes of perspective when information scientists and computer scientists need to work collaboratively to solve digital preservation problems?
In early 2003, The SCO Group sued IBM for $1B over alleged IBM violations of intellectual property intrinsic to UNIX™ and Linux™. Some opinions hold that this lawsuit challenges the General Public License (GPL), which is critical to a legal protection implicit in the success of the open source movement. To understand the intertwined history and legal issues, you might read some of the filings accessible from http://www.caldera.com/ibmlawsuit/ and the OSI position paper.
To me, the SCO lawsuit seems without merit, and perhaps amounts to barratry, which is a crime in some jurisdictions. We must wonder, “What is the purpose of this lawsuit?”
Who will win and who will lose? The likely losers are speculators who purchased SCO stock at prices inflated by hopes for windfall profits.
Constant vigilance seems necessary to protect
freedom of speech. A November
3rd USA today article
illustrates a current common problems on university campuses—administrative
attempts to limit offensive statements by suppressing potentially inflammatory
expressions of opinion.
The 32nd session of UNESCO's General Conference adopted five standard-setting instruments in October, including the International Convention on the Preservation of the Intangible Cultural Heritage.[22]
Enabling United Kingdom legislation[23] in October has extended obligatory publication deposit[24] from paper to other media, including digital representations.[25] The enthusiasm of librarians and history buffs is reported by the British press.[26]
The World Wide Web Consortium has announced XForms 1.0 as an improvement over HTML for information via the Web. Xforms was created partly because HTML is limited to simple tasks, such as Web page creation.
If you did not notice that Amazon is providing full text search for current books, having started in October with 120,000 titles, you might be interested in an October 24 SJMN article or Amazon's description. This service ingeniously complements research library and other academic efforts that limit themselves to out-of-copyright material. Wired Magazine presents a good review, The Great Library of Amazonia.
The exciting value, to me, is help for finding quotations and verifying citations.
The Advanced
Foundations for American Innovation Supplement to the President's FY2004 Budget
from the U.S. National Coordination Office for Information Technology
Research and Development (NITRD) gives a fascinating overview of research
priorities of U.S. agencies (NASA, NSF, etc).
In view of the continuing pertinence and depth of insight expressed by C.P. Snow, scientist, civil servant, and novelist, in The Two Cultures[27], I am amazed afresh whenever I discover how few people know it. The slim volume captures the Rede lecture delivered at Cambridge in 1959. It synopsizes itself with:
“as one moves through intellectual society from the physicists to the literary intellectuals, there are all kinds of tones of feeling on the way. But I believe the pole of total incomprehension of science radiates its influence on all the rest. That total incomprehension gives, much more pervasively than we realise, living in it, an unscientific flavour to the whole `traditional' culture, and that unscientific flavour is often, much more than we admit, on the point of turning anti-scientific. The feelings of one pole become the anti-feelings of the other. If the scientists have the future in their bones, then the traditional culture responds by wishing the future did not exist. It is the traditional culture, to an extent remarkably little diminished by the emergence of the scientific one, which manages the western world.”
Deciding whether two digital objects represent the same document is surprisingly difficult. A recent Allen Renear and David Dubin article, Towards Identity Conditions for Digital Documents at a 2003 Dublin Core Conference merits a DDQ reading recommendation.[28]
Although many studies, reports, and articles have explored how digital collections and communities have transformed the university environment, little research addresses the potential and implementation challenges of digital libraries for K-12 users. Developing Digital Libraries for K-12 Education combines work of authors united by the common mission of transforming K-12 education.
Microsoft Windows 98™ has been ineligible for help from Microsoft since 1st November. Windows 2000™ is scheduled for end of availability on 31st March 2004. Prudent users will apply all important updates and prepare reliable system backup copies for recovery from disruptive events.
Improvements made about a year ago make bringing Microsoft software up-to-date easy.
Several offers of free or inexpensive office suites (word processor + spreadsheet + vector graphics drawing + presentation graphics preparation) have appeared. I’ve been attracted to OpenOffice™ because its XML file storage representation promises to be helpful for digital preservation, but did not pursue that vigorously after I found that current XML Spy™ did not accept all OpenOffice documents.
I have not tried SOT Office™, but the price is right—it’s free—and the functionality is attractive: a word processor that accepts files from MS Word and OpenOffice, and spreadsheet, presentation, and graphics drawing programs, all supported both under MS Windows and Linux.
Its description makes e-press EasyOffice™ worth trying, which I’ve not yet done because other work seemed more attractive and urgent.
For people comfortable with earlier Microsoft Office™ versions, there is an inexpensive alternative to Microsoft Office 2003™. From what supplier? Why, it’s from Microsoft! Take a look at Microsoft Works 2004™, which offers a full version of MS Word™, spreadsheet, database, and other tools. The list price is $100; however a shopping demon showed it can be obtained for about $50.
Partly because it’s the holiday season, widely celebrated as a commercial occasion, we are deluged with magazine articles about “the latest and the best” information technology. An example is Business Week’s October 16th report of the ITU Telecom World conference at which “technology outfits from all over the world displayed their newest and most innovative gizmos”.
If you do not have an urgent need or an irrepressible urge for expensive toys, I recommend waiting for one to two years for better versions of much of what is now being ‘hyped’. Why? Many recent offerings are being rushed to market before a stable partitioning of functionality has emerged and before the vendors have carefully refined usability. Offending devices might be adequate for people who use it heavily, but require too much learning time for casual users. Usability will surely improve when the “early adopter” market is approaching saturation.
For instance, as much as I would enjoy playing with a GPS device, I’ve waited for two years and will wait another year for price reductions, usability improvements, and compelling applications. I read:
“A small group of companies banded together to demonstrate the potential use of global positioning system (GPS) data when combined with a handheld PC, a mobile connection, location information, and mapping software. It's not a product now—and may never be—but it points the way for potential applications.” Business Week, October 16
It is my impression that computing technology prices have temporarily stabilized. Perhaps this is because of rosy projections of December retail sales. If so, we can expect to see significant price drops after 25th December. If this happens, I’ll resume reporting prices in DDQ 3(1).
In the meantime,
you might be interested in the ExtremeTech
recommendation of a build-your-own high performance PC for $800.
Acknowledgements
Once again, it is a pleasure to acknowledge that discussions with John Bennett, Tom Gladney, and John Swinden have helped create this DDQ number.
[1] See C.T. Cullen, Authentication of Digital Objects: Lessons from a Historian's Research, in Authenticity in a Digital Environment, CLIR Report pub92, 2000. “... something more than provenance or traditional testing methods established for analog objects is needed. I believe it is easier to describe the characteristics of an authentic digital object than to support the authentication beyond a reasonable doubt. ... At the least, we must agree on some means of testing the authentication of digital objects. The consequences of not doing so are dire.”
[2] This brief tutorial draws on Jukka Korpela’s
A Tutorial on
Character Code Issues and Reuven Lerner’s At the Forge: Unicode, Linux Journal 107,
18-20, March 2003. These Web pages
provide a comprehensive set of citations and links to Unicode resources and
explanations that include descriptions of idiosyncrasies and pitfalls of
character representation.
[3] A good summary of fonts is An Operational Model for Characters and Glyphs, ISO/IEC TR 15285 (1998). It includes:
”People interpret the meaning of a written sentence by the shapes of the characters contained in it. For the characters themselves, people consider the information content of a character inseparable from its printed image. Information technology, in contrast, makes a distinction between the concepts of a character's meaning (the information content) and its shape (the presentation image). Information technology uses the term character (or coded character) for the information content, and the term glyph for the presentation image. A conflict exists because people consider characters and glyphs equivalent. Moreover, this conflict has led to misunderstanding and confusion. This Technical Report provides a framework for relating characters and glyphs to resolve the conflict because successful processing and printing of character information on computers requires an understanding of the appropriate use of characters and glyphs. … [It]
· differentiates between coded characters and registered glyphs
· identifies the domain of use of coded characters and glyph identifiers
· provides a conceptual framework for the formatting and presentation of coded character data using glyph identifiers and glyph representations.”
[4] Unicode currently defines almost 100,000 characters. An online reference to the character contents of the Unicode Standard is available at http://www.unicode.org/charts/ and http://www.unicode.org/charts/charindex.html.
[5] Files containing only 7-bit ASCII characters are unchanged when viewed with Unicode UTF-8 encoding, so plain ASCII files are already valid Unicode files.
[6] See Markus Kuhn’s UTF-8 and Unicode FAQ for Unix/Linux.
[7] Digital representations are specified by encoding rules, such as UTF-8, and glyphs are defined in font tables.
[8] Character coding is the method by which characters are represented by binary strings.
[9] The numbering in the table and the accompanying text is that of the Invest to Save report.
[10] Such expressions of urgency are echoed in §3.1 and §3.10 of the Invest to Save report.
[11] The highlighting seen below has been added to the quoted original text.
[12] The largest NSF awards for digital library work were about $3M. I do not know enough about EU practice to comment.
[13] 1C calls for “reading diverse classes of media are needed to address peripheral device obsolescence.” In order to prevent errors caused by dust, high recording density devices are typically sealed, thereby frustrating direct access to their media.
[14] For instance, all operating systems will continue to provide a file abstraction—a memory or communication link mapping of data as a bit sequence. Such file abstractions will continue to be independent of device encoding formats.
[15] IBM Research work shows that the total cost of copying to fresh media would be less than 1% of the running cost of digital medical imagery storage and management in a hospital. See W.F. Cody et al., Can Hospitals Afford Digital Imagery? in R.G. Jost, editor, Medical Imaging VIII: PACS Design and Evaluation, Proc. SPIE 2165, 613-628, (Feb. 1994).
[16] See Dan Huttenlocher and Angel Moll, On DigiPaper and the Dissemination of Electronic Documents, D-Lib Magazine 6(1), January 2000.
[17] See, for instance, 2D (Two Dimensional) Bar Code Symbologies and Two Dimensional Bar Code Guide.
[18] This is explained in the article announced above, Trustworthy 100-Year Digital Objects: Syntax and Semantics—Tension between Facts and Values, especially in its §2 and Figure 5.
[20] For instance, see §3.8 of the Invest to Save report cited above.
[21] To me, observing with the mind-set and training of a scientist, every specific introduced by such a speaker seemed correct and to the point. I.e., the problem under discussion is not an effect of obviously erroneous or unacceptable reasoning.
[22] The charter is reproduced in English at http://unesdoc.unesco.org/images/0013/001311/131178e.pdf.
[23] ‘Enabling legislation’ defines obligations without providing either funding that might be needed to accomplish what is edicted or specifying administrative procedures that might be essential.
[24] What it means to be a ‘deposit library’ is explained at http://www.nla.gov.au/services/ldeposit.html.
[25] See http://www.parliament.the-stationery-office.co.uk/pa/cm200203/cmbills/026/en/03026x--.htm for explanatory comments.
[27] C.P. Snow, The Two Cultures, Cambridge UP, 1959 and 1964. ISBN 0-521-45730-0
[28] A DDQ reading recommendation for a research article is deliberately rare.