Digital Document Quarterly Glossary

A work in progress

<

H.M. Gladney

HMG Consulting

20044 Glen Brae Drive

Saratoga, CA 95070

©  2002-8, H.M. Gladney

Updated 1st July 2008

Begun 30th March 2002

 

To communicate precisely is surprisingly difficult.  Writing is more difficult than conversation because no listener can signal confusion that a speaker might promptly correct.  Even though Digital Document Quarterly numbers are edited carefully, aided by critical input by a small advisory group, I am not as confident as I would like to be that readers will infer what I intend.  The difficulty is even greater for documents in long-term storage (Figure 1).

Figure 1: Simplified version of a model used in Preserving Digital Information[1]

One can reduce the communication difficulty by providing careful definitions and contextual information.  However this remedy creates its own hazards—lengthy explanations that try readers’ patience, blizzards of detail that obscure central points, and seeming pedantry.

Such difficulties hamper community attempts to design information sharing tools as emphasized in digital library literature.  Different authors use even well-known terms, such as “archiving”, differently.

The definitions that follow are what is sometimes called “technical definitions”—definitions intended to help readers understand a specific article or body of writings.  Typically, a technical definition is used for a word or phrase that has many different meanings in popular usage and/or in the writings of other authors.  Often a technical definition does not conform precisely to any previous usage.  Readers who are unfamiliar with the idea of a technical definition, or do not notice that this device is in effect, are likely to protest that the author misunderstands the meaning of the term (as they construe it).

Documents and digital objects are used to communicate information.  Information is a subset of knowledge.  Felix Mauthner said, “Philosophy is theory of knowledge”.[2]  Ludwig Wittgenstein commented, “All philosophy is a 'critique of language' (though not in Mauthner's sense). … the apparent logical form of a proposition need not be its real one.”[3]  How words and phrases are used in DDQ is strongly influenced by readings of 20th-century epistemology.

What disturbed Mauthner, above all, was the tendency ordi­nary people have to attribute reality to abstract and general terms.  This natural tendency to reify abstractions he regarded as the origin not just of speculative confusion, but also of prac­tical injustice and evil in the world. Reification—to use a Machian phrase—begets all sorts of "conceptual monsters."  In sci­ence, these include such misleading notions as force, laws of nature, matter, atoms and energy; in philosophy, substance, ob­jects and the absolute; among religious ideas, God, the devil and natural law; in political and social affairs, obsession with no­tions like the Race, the Culture, and the Language, and with their purity or profanation.  In all such cases, reification involves assuming the existence of entities which are "metaphysical."  So Mauthner considered metaphysics and dogmatism to be two faces of a single coin, which was also the fountainhead of in­tolerance and injustice.                                                                                           Janik, p.123[4]

It seems to me that verbs are less subject to this problem than are nouns.   Cf. “knowledge” and “to know”; “information” and “to inform”.

 

abstract

(noun) summary of a statement, document, or speech; (verb) reduce by eliminating all properties not essential to the concept or the class in question; (adj.) expressing a characteristic apart from any specific object or instance.  For example, abstract data types are defined without any commitment to particular encodings for instances.

access control

(noun) security component which defines who may do what and administers these rules, as defined by an ISO standard;[5] the process of determining which uses of resources within an open system environment are permitted and, where appropriate, preventing unauthorized access, which is frequently subdivided into classes known as unauthorized use, disclosure, modification, destruction, and denial of service. The other parts of a complete security system ensure that the registered rules are complied with, and that an audit trail is maintained.

access path

(noun) means of referring to an entity by identifying positions in (a nest of) containing entities, e.g. John Doe in the San Jose office of the Acme Corp.  A name is a special kind of access path; the containing object is a context.  An index into an array is another kind of access path.

architecture

(noun) abstraction of design, hiding features not of interest to a user of the thing described; rules for interfaces provided for some collection of entities and services; the choice and structuring of what can be viewed and what manipulations can be performed through these interfaces.

archiving

(noun) digital content management needed to ensure ready access to reliable records immediately, in the near future, and in the distant future. 

asymmetric key cryptography

(noun) see public key cryptography.

archive

(noun) (1) persistent storage used for long term information retention, typically very inexpensive per unit stored and with a long response time, and often in a different geographic location to protect against equipment failures and natural disasters; (2) collection of historical documents or records.12

Notice that these two definitions identify quite different object classes.

Archivists emphasize that the rules and conventional procedures for their collections are different than those for libraries.  Briefly, what is important to an archivist is the authenticity of each archived object, and evidence for that authenticity.  These can be established without understanding the content itself.  I.e., the associated metadata are, in a certain sense, more important than the content.

atomic

(adj.) not decomposable into parts, at least for the discussion of the moment.  For example, the integer 2 is atomic and the list 2 4 6 is not.

attribute (esp. of a digital object)

(noun) synonym for property; mathematical value that is a mathematical function of the object.  The words most important to us, such as “value” and “function”, often have multiple meanings and no synonyms with which we can eliminate ambiguity.  Where minimal ambiguity and misunderstanding are critical in DDQ, we include modifiers or other concise methods to reduce the risks.  Wiitgenstein’s lectures[6] illustrate how fragile natural language is.

auditor

(noun) human role in which the individual is responsible for checking that resources are not being misused or misappropriated and/or that mechanisms to prevent misuse and misappropriation are in place and being used as prescribed.

audit trail

(noun) sequence of records of events deemed important to determine whether or not a set of resources has been used in accordance with guidelines or limitations defined by appropriate authorities; the results of monitoring each operation of subjects on objects; for example, an audit trail might be a record of all actions taken on a particularly sensitive file or a record of all users who viewed that file.

authenticate

(verb) verify the identity of a person (or other agent external to the protection system) making a request; for a standard definition.[7]

authentication

(noun) mechanism for establishing with known confidence that a token passing between processes belongs to a set of allowed tokens; typically each such token identifies a subject and also contains some secret that could only come from the single user authorized to use this subject; if the token is acceptable, the subject is bound to the issuing process–a step called login if the user is a human being, i.e., the process can act on behalf of this subject.

authenticity

(noun) property of being associated correctly with sufficient provenance information to convince any recipient that the signer deliberately signed the document.  The provenance information should not be easily reusable, in the sense that it should be difficult to detach the signature from one document and reattach it to a different document so that a recipient is convinced that the signer actually signed the latter document.[8]

authority

(noun) (1) privilege and responsibility to utilize and/or control some resource; (2) quality of special value of information stored or conveyed, because of either knowledge or official right to comment, as in “spoken with authority”; (3) especially valuable commentator by virtue of superior knowledge, diligence, or scholarship on the topic at hand, as in “Holmes was an authority on the forensics of tobacco ashes”.  The pertinence is that the user of library information is at least implicitly interested in how trustworthy each extracted datum is.

bit

(acronym) contraction of the term "binary digit."

bit-stream

(noun) potentially unbounded sequence of binary characters, typically an information representation transmitted over a serial channel.

bit-string

(noun) finite sequence of binary characters; a synonym for file or dataset used to emphasize that it denotes an information representation readily transmitted via a serial channel or stored on a disk or tape.

blob

(noun) acronym for binary large object, used to denote a unit of data whose representation, meaning and interpretation are not pertinent to the discussion at hand, such as the objects stored and catalogued in a library.  The acronym can be construed also as binary little object.

bottom-up

See the endnote diagram and table.[9]

breadth-first

See the endnote diagram and table.

cache

(noun) specialized store used to hold objects temporarily, often with the objective of more rapid access than would otherwise be possible.  In computer systems, caches often are intended only for replicas of information held more securely elsewhere; however cache should be construed in the former, broader sense.

catalog

(noun) in computer science and related fields, table relating names to names, objects or locations of objects, and possibly also object descriptions; synonym for directory (q.v.); among librarians, a specific kind of finding aid with one or several entries for each collection element and conforming to a carefully documented standard.

certificate

(noun) in the context of information security, an unforgeable object that attests to the accuracy, correctness, completeness, and provenance of some information.

consistent

(adj.) of a data collection, conforming to all externally specified rules pertaining to this collection and required to define correctness.

constraint

(noun) rule relating (values of) two entities or limiting the membership of a set.

consumer

(noun) person who obtains and makes use of a document, including merely reading it, whether or not this use is as originally intended by the document’s producer.

context

(noun) a set of pairs mapping from names to entities or to other names; definition of the meanings of names; set of bindings between names and entities.  For example, the meaning of “bald” depends on the context.  If the context is English, “bald” means “without hair”; if it is German, “bald” means “in kurzer Zeit” (“in a short time”).

countermeasure

(noun) mechanism that reduces vulnerability to a threat.

credentials

(noun) unforgeable data that guarantee claimed identity.

cryptanalysis

(noun) study and practice of various methods to pen­etrate ciphertext and deduce the contents of the original cleartext mes­sage.

cryptographic algorithm

(noun) mathematical procedure, used in conjunction with a closely guarded secret key, that transforms original input into a form that is unintelligible without special knowledge of the secret information and algorithm.  Such algorithms are also the basis for digital signatures and key exchange.

cryptography

(noun) originally, the science and technology of keeping information secret from unauthorized parties by using a cipher.  Cryptography is used for many applications that do not involve confidentiality.

data

(noun) information that is not intended to convey as much meaning as many similar communications might convey.  (The boundary between data and information is fuzzy.)

decryption

(noun) cryptographic procedure of transforming ciphertext into the original message cleartext.

delegate

(verb) grant a subject permission to grant further subjects privileges.

depth-first

(adj.) See the endnote diagram and table.

digest

(noun) much condensed version of a message produced by processing the message by a hash algorithm.  Commonly, the digest length is independent of the length of the original message.

digital signature

(noun) data appended to a message to assure the recipient of the origin and integrity of the message; a digitized analog of a written signature, produced by ­a cryptographic procedure acting (commonly) on a digest of the message to be signed.

digital signature standard (DSS)

(noun) U.S. government stan­dard (FIPS 186) describing a cryptographic algorithm for producing a digital signature.

DRI

(acronym) Digital Resource Identifier, a specific kind of uniform universal identifier, as described in Preserving Digital Information, §7.3.4.

directed acyclic graph (DAG)

(noun) directed graph (q.v.) in which no path starts and ends at the same vertex.

directed graph

(noun) graph whose edges are ordered pairs of vertices.  Each edge can be followed from one vertex to the next.

document

(noun) representation of any kind of information, such as a command, text, a photograph, video or audio information, a scientific table, a spreadsheet, a computer program, or any other kind of information, or any ordered or unordered combination of such specialized kinds of information, whether conveyed on a material substrate or represented and conveyed digitally.

domain

(noun) see function for mathematical context.  In relation to security, a set of subjects and information objects whose use is governed by a set of rules.

durability

(noun) in transaction processing, the property that state changes of successfully completed transactions survive failures.

encapsulate

(verb) hide selected information from an external environment, as in certain programming language definitions of data types.  The unit of text involved is called a capsule.

encrypt

(verb) scramble data according to a secret transformation key, so as to make it safe for transmission or storage in otherwise inadequately protected environments.

environment

(noun) relative to an activation, the set of objects (and their values) reachable for a function evaluation.

essence

(noun) in communication, the information that a speaker or writer intended to convey, in contrast to inevitable accidental information.  For instance, in a lecture, the speaker’s voice pitch is usually accidental, not essential, as are the page break locations in a printed document.

evidence

(noun) set of facts demonstrating the truth of some assertion, with each fact being either obvious or objectively testable; the evidence for a document’s authenticity can be either external (in the form of attached metadata) or internal (in the meaning or representation of the document itself).

fact

(noun) a thing done; an action performed or an incident happening; an event or circumstance; an actual occurrence; an actual happening in a time or space or an event mental or physical; that which has taken place.  A fact is either a state of things (an existence) or a motion (an event).

faithful

(adj.) of a data copy, conforming accurately to some other data instance, usually identically bit by bit.

finding aid

(noun) librarian's term for a tool that is not a catalog, but serves a similar purpose as a catalog to the extent that something simpler (and much less expensive) can do; compare catalog.

firmware

(noun) program information used to control the low-level operations of hardware.  Firmware is commonly stored in read only memory (ROM), which is initially installed in the factory and may be replaced in the field to fix mistakes or to improve system capabilities.

fix

(verb) (1) of content such as text or an image, make a relatively immutable on a physical medium, e.g., by printing on paper or developing a photographic film; (2) repair; (3) (colloq.) damage.

folder

(noun) digital object that contains other objects by reference.  In the digital analog of a paper folder system, every document except one occurs in exactly one folder; the folder relationship is acyclic.

formal

(adj.) pertaining to, or emphasizing, the organization or composition of the constituent elements.  For example, in a formal mathematical system the elements of discourse are not associated with meanings; interest is limited to relationships between elements, which are deduced from simpler relationships (axioms) on the basis of (a small number of) combining forms.

function

(noun) in DDQ, always a mathematical function.  A function is defined on two sets, the domain and the range and consists of a set of pairs in which the first component is from the domain and the second component is from the range and in which there are no two pairs with the same first component.  A function is total if every member of the domain is in some pair, and partial otherwise.

In common usage, “function” has many different meanings.  Several of these are used when discussing computing, sometimes within a single discourse.  This is a source of confusion.

generic

(adj.) referring to all members of a class, group, or kind.  For example, a generic operator denotes a set of operations of (presumably) similar function; each member of the set has different operand types.  See also operation and operator.

global

(adj.) describing an entity which is accessible without being explicitly mentioned in an operand of the program defining an operation, or being derivable from explicit operands, or being created by the operation.

glyph

(noun) picture for a character of printed or written language.

granularity

(noun) measure of the level of detail with which some data object set is accessible or is controlled by some process or program being discussed.  For instance, for objects managed by the library system, the granularity for access control might be items whereas the granularity for copying to/from library stores might be item parts.

graph

(noun) picture, or its abstract counterpart, describing the connections among a set of entities.  Used in tracts on “object orientation” to represent the relationships between entities for the purpose of making distinctions clear.  The entities are denoted by points, called nodes, and the connections by lines, called arcs.  If the direction of