Digital Document Quarterly

Perspectives on Trustworthy Information

Volume 6, Number 3, 3Q2007

 

 

 

 

HMG Consulting

Saratoga, CA 95070

©  2007, H.M. Gladney

 

ISSN: 1547-8610

A 2005 cartoon by Pat Oliphant depicts God working at a drawing board, with a bearded angel looking over His shoulder, and ascribes to God the words,[1]

“I’ve been trying to perfect some kind of intelligent design, but all I keep coming up with is a bunch of simple-minded, right-wing, fundamentalist, religious fanatics.  I think I’ll just let the whole thing evolve.” 

Digital Preservation

Science fiction fans will welcome the digitization of the late Robert Heinlein’s works.  The UC Santa Cruz Library is making available a collection of 106,000 pages, consisting of Heinlein's complete manuscripts—including all his published works, notes, research, and early draft manuscripts.

NDIIPP Funding Withdrawn

According to the Washington Post on May 17, “in February, Congress passed and the president signed legislation rescinding $47 million of the program's approved funding.[2]  This jeopardizes an additional $37 million in matching, non-federal funds that partners would contribute as in-kind donations.”  The effects start in fiscal 2008.

On August 3, the Library of Congress announced eight grants in a new initiative, Awards to Preserve American Creative Works.

The latest NDIIPP achievements report is part of the LoC Strategic Initiatives Annual Review for fiscal 2005.

Another Task Force, and Also New NSF Funding

Though significant progress has been made to overcome the technical challenges of achieving persistent access to digital resources, the economic challenges remain daunting.  With this motivating assertion, the National Science Foundation (NSF) and the Mellon Foundation have announced yet another digital preservation study.  Since the topic has been studied by several committees in the last decade, I cannot help but wonder what new ground this task force hopes to expose.[3]

The announcement calls for “economic sustainability of digital information for the science and engineering, cultural heritage, academic, public, and private sectors.”  As I understand it, the economic problem for preservation of scholarly digital content is that research collections live by governmental and charitable funding and want to extend their reach to digital content without diverting funds from paper collections.

The U.S. Government has funded a substantial digital effort at NARA, but there are next to no similar efforts for state or local government holdings.

The private sector has expressed little interest in digital preservation.[4]  Its priorities seem to be elsewhere, such as in satisfying Sarbanes-Oxley imperatives and continuing to make its digital infrastructure safe and effective.

My puzzlement is increased by an almost simultaneous NSF call for proposals for Sustainable Digital Data Preservation and Access Network Partners (DataNet).  This seeks development of new types of organizations that "integrate library and archival sciences, cyberinfrastructure, computer and information sciences, and domain science expertise," and offers projected funding of $200,000,000 over ten years.  In keeping with the NSF mission, this seems to be directed at mostly academic digital content.

While digital preservation for the U.S. academic and national government sectors receives enough attention, other important resources continue to be ignored.

Little attention is paid records essential for utilities, transportation, and social services, such as records needed to care for the 40% of U.S. bridges that will need to be significantly bolstered in our lifetimes.  (Why did that Minneapolis bridge collapse?)

What might be done to achieve the long-standing medical dream called “the longitudinal patient record”?[5] 

Presuming that citizens become interested in lifetime records,[6] what preservation tools might ensure these records’ utility decades after they are created?

All this background suggests that the yet-to-be-identified task force members will have to ask themselves, “Precisely who are we trying to influence?  What persuasion can we invent beyond what has been said before?”

Vintage Gadgets at the Computer History Museum

For an account of a different kind of digital preservation, see a San Jose Mercury News video.  Behind the scenes at the Computer History Museum includes forgotten tales from the frenetic history of the electronics industry.

Epistemology

Notes on Natural Language

What does an American call the drink that is made with milk and ice cream: a milkshake, frappé, cabinet, or thick shake?  His answer helps suggest the U.S. region of childhood.  A Dialect Survey begun at Harvard demonstrates that, while most Americans speak English, they employ quite varied dialects.  Available maps show terms and pronunciations used in different national regions.

The more epistemology I read, the more sensitive I become to how difficult it can be to communicate precisely.  For instance, I am immediately alert when I hear the word “just” used as a synonym for “only,” because it is likely to signal an excuse for unacceptable behavior, as in, “I just hit my little sister twice!”

Consider the phrase, “now let’s be sensible”.  All too often, what the speaker means is, “What you have proposed is nonsense.  What I’m about to propose is most prudent and likely to be effective!” 

Within a debate, it can help to remember that the universal quantifier, “all”, is likely to introduce an assertion that can be refuted with a single counter-example.  Perhaps such difficulties influenced the conversation of Oxford dons returning by train from London.  Looking at a passing farm, the first commented, “All the cows in that field are brown.”  His companion responded, “On this side, at least!”

Heavily used words are likely to be overloaded,[7] or to have meanings that depend on when they were uttered, or by whom.  Some words that are key for careful thinking are burdened in such ways.  A reader needs to be sensitive to the possibilities inherent in words such as “value”, “logic”, and “scientific” or “science”.

Value: among these examples, “value” is relatively straightforward.  It can denote an abstraction (in contrast to an object, as in “mathematical values”) or a philosophical attitude (an ethical judgment).  It can also be used as an economics attribute, as in “today’s value of gold is $602 per oz.”  Often the usage context provides clear evidence which of these, or of several variants, is intended.

Logic: the issue is not so clear for “logic”.  One needs to remember history.  Whenever Kant used “logic”, he must have intended Aristotelian syllogisms,[8] because until the 1847 work of George Boole,[9] “logic” meant what we today call “syllogistic logic.”  In modern professional usage, whenever ambiguity might lead to misunderstanding, a modifier is included to clarify, as in “first order logic” and in “modal logic”.  However, in everyday conversation “logical” is usually a synonym for “reasonable”, in the sense of “carefully thoughtful.”

Science: “scientific” is more troublesome.  In much of its modern usage, it is intended to claim academic prestige for some topic.  In reaction, a student enrolled in university science courses—chemistry, zoology, psychology, …—is likely to snort derisively if the term is used outside this scope.  The same student is unlikely to know that the word “science” originated with the Latin “scio,” and thus arguably would be correct for any form of knowledge, but not for religious beliefs, myths, aesthetic judgments, or ethical precepts. 

Isaac Newton did not call what he did “scientific research,” but instead “Natural Philosophy,” a term which his contemporaries thought appropriate.  When Immanuel Kant used (the German equivalent of) “science” in his Critique of Pure Reason, this was explicit allusion to Newtonian methodology.  Kant’s immense influence led to “scientific” being used by philosophers to mean a combination of logical reasoning with empirical observation.

Nevertheless, the first reaction of a physics student to the title of Ernst Cassirer’s The Logic of the Cultural Sciences[10] is likely to be puzzlement about both “Logic” and also ”Cultural Sciences.  The title is eminently descriptive of the book’s theme: an exploration of the applicability of physics and mathematics methodology to topics that today we usually call “the Humanities.”  Of course, Cassirer did intend to draw attention by usage that was unusual in 1942, so his choice was very successful.

A hallway joke, popular among 1960’s undergraduates, asserted: “If it has ‘Science’ in its name, it ain’t a science.”  For instance, I think of Computer Science more as an engineering discipline (devising how to accomplish practical objectives) than as a science (studying the constituents of the world and their interactions.)

Knowledge and Information

The current DDQ number begins a DDQ discussion of the nature and prospects of Information Science.  Some authors have promoted their information management projects as “knowledge management,” perhaps because the broader term helps them obtain enterprise support and funding, for instance in a suggestion that university Information Science Departments should be renamed “Knowledge Science Departments.”[11]  To make what follows as clear as possible, it might help to say what DDQ means by ‘knowledge’ and by ‘information’.

What is the distinction between knowledge and information?  What is the distinction between “knowledge” as an objective topic and “knowing” as a psychological topic?

Authors usually seem to take for granted that their readers understand what they mean by ‘knowledge’.  The books in which one might hope to find a definition provide none, perhaps because the concept is too primitive to be defined in simpler terms.[12]  Instead, one finds ‘knowledge’ as a modifier, as in “knowledge acquisition,” “knowledge base,” “knowledge engineering,” “knowledge level,” and “knowledge source,” and also many scenarios describing how knowledge figures in human and animal behavior.[13]  There seems to be agreement that knowledge has to do with whatever it is that we mean by “the mind.”

Consider Knowledge Management (KM)[14] as exemplary use of ‘knowledge’ as a modifier.  Under various names, it has been considered by the artificial intelligence community for about thirty years and by archivists grappling with records and information discovery for about ten years.  Perhaps this is why The Semantic Web is prominent in recent Information Science literature.[15] 

Should KM be construed to include job-related training of the kind that IBM has for decades provided its employees?  Should we think of schools and colleges as Knowledge Management institutions?  When a father teaches a child how to ride a bicycle, is this elementary knowledge management?  Apparently not, because KM literature[16] seems to focus on the use of information systems tracking employee skills so that, for instance, the best people can be assigned as consultants.  It is not misleading to use KM as a descriptor for making the expertise of a for-profit enterprise available internally and for hire.  However this is less than the untutored might suppose, so that ‘Knowledge Management’ might best be considered a technical term for what the cited papers describe.  If so, KM would be part of Information Science.

What is the distinction between “knowledge” as an objective topic (treating what can be known and communicated) and “knowing” as a psychological topic (discussing how some person or animal knows)?[17]  Wittgenstein makes immense effort to treat objective knowledge, separating this as much as possible from subjective aspects.  William James focuses on psychology.[18]

Depictions such as Fig.1 of DDQ 3(3) remind us of relationships between knowledge and information.  Its 0→1 transition (saying or writing) is used to convert knowledge to information as the first step of informing, and its 9→10 transition (hearing or reading) can convert information to knowledge.  Thus, a distinction between knowledge and information is that ‘knowledge’ is associated with the verb ‘to know,’ a descriptor of a single human being, and ‘information’ with the verb ‘to inform,’ an interaction between two human beings.

One might suppose that the set of all possible information is a subset of all possible knowledge.  However this is not obvious, as a speaker can lie or be mistaken.  We do not usually consider false or mistaken assertions to be part of knowledge.

Precision about knowledge management might be aided by using ‘knowledge’ exclusively for concepts, facts, behavior patterns, … in the mind and ‘information’ for concepts, facts, behaviors, … conveyed on physical media or signals.[19]  This has been, and will continue to be, part of the technical distinction used by DDQ.[20]

I believe that these distinctions conform to predominant common usage, and are consistent with the cited works.12

Information Science [InfoSci]

So-called “information science” seems to me a passing enthusiasm.  Its claim to academic status commensurate to that of older disciplines, such as chemistry, psychology, and philosophy, is likely to evaporate within twenty years.  This will be evidenced by the disappearance of university Information Science departments as organizational units parallel to Computer Science departments.[21]

I began to believe this when I noticed how much of my reading was of authors who identify themselves as information scientists.[22]  For instance, long-term digital preservation seems to be an almost exclusive concern of information scientists.[23]

It is striking how little evidence exists for information scientists’ attention to related fields that have examined the fundamentals of human communication or to engineering of electronic tools.  Nor do computer scientists or software engineers seem to pay much attention to the literature of information science.[24]

What is Information Science?

That the future of InfoSci as a prominent academic topic merits DDQ attention is suggested by Zins’ articles[25] that perhaps parallel other authors’ ideas.  The following table excerpts fifty definitions provided by InfoSci practitioners responding to a survey.[26]

From Zins’ Conceptions of Information Science

Knee-jerk reactions

Elsa Barber: "InfoSci [studies] the functions, the structure and the transmission of information and the management of information systems.  It is the study of data, information, knowledge, and message … in the collective domain, explor[ing] only the mediating aspects, focus[ing on] hi-tech and included user studies."

This seems to define a sort of meta-science that limits itself to observing information structure and human reactions after the fact.

Shifra Baruchson-Arbib: "InfoSci explores the methods for allocation, organization, analysis, and dissemination of information, and the human and the technological tools appropriate for these purposes.  It is the study of the technological and the social process that occurs while changing data to message."

As in the prior definition, the creation of information tools does not seem to be included.  Is this intentional?

Clare Beghtol: "InfoSci [studies] data, information, knowledge and messages (however defined and [interrelated]) in relation to human behaviour and use."

How is this different from a definition of epistemology as represented in PolanyI’s Personal Knowledge?

Michael Buckland: “There [are] different views of InfoSci.  One is … re-discovery of the primary historical basis … : Documents and Documentation from 1880s onwards.  Another is a more general InfoSci] that attempts to include all of D-I-K-M. A third is an IT-constrained view anchored in digital technology."

This seems to confirm the passivity inherent in the Barber’s and Baruchson-Arbib definitions.

Charles H. Davis: InfoSci is an interdisciplinary field encompassing all aspects of data from data generation via measurement and observation, through data capture, analysis, representation, organization, evaluation, storage, transformation, presentation, protection, and retention.

I.e., InfoSci is an interdisciplinary field, not a discipline in the same sense as are chemistry and computer science.

Anthony Debons: "InfoSci is the area which attempts to determine the principles pertaining to the analysis, design and evaluation of Data, Information and Knowledge Systems. It is based on the rationale that all organisms are data, information, and knowledge systems, varying in the degree with which they can process cognitive/affective functions. Each of these functions is aided and augmented by technology that each species generate, invent, and apply.”

I.e., InfoSci is a study of technical tools that can assist the study of epistemology.  It is observational like zoology, in contrast to proactive like engineering.

Nicolae Dragulanescu: "InfoSci … studies the information (as a process, as a product or as a state of awareness) as well as its five basic sub-processes—generation, processing, communication, storage, and use—in order to optimize them …. Its goal is to facilitate the knowledge transmission [between people] and [between generations] in order to accelerate the progress of mankind."

The focus on optimization suggests large overlap with Computer Science and with software engineering.

Luciana Duranti: "InfoSci is a mathematical discipline that studies technological ways of conveying information."

If so, why does the InfoSci literature include so little explicit mathematics?

Raya Fidel: “InfoSci is the study of the interaction between humans and information and all the mechanisms and elements of context that play a role in this interaction.”

I.e., if information is taken to be a subset of knowledge, then InfoSci might be seen as part of epistemology.

Donald Hawkins: "InfoSci is an interdisciplinary field concerned with the theoretical and practical concepts, as well as the technologies, laws, and industry dealing with knowledge transfer and the sources, generation, organization, representation, processing, distribution, communication, and uses of information, …"

“The sources, …, organization, … and uses of information” suggests significant overlap with sociology.

Ken Herold: "InfoSci is the study of the transformations and interactivities among data, information, knowledge and message objects, structures and processes, in order to construct systems to communicate culture as regenerated knowledge.  InfoSci is the mutable and transitory discipline at the confluence of librarianship, documentation, media & communications, computation, and applied philosophy."

“In order to construct systems to communicate” sounds like computer science.

Birger Hjorland: "InfoSci is a field that aims at providing better library, documentation, and information service …  Historically, InfoSci developed out of special librarianship and documentation.  People in the field were originally subject specialists who worked to improve … scholarly communication … [with] attempts … to construe a theoretical framework for practical-oriented information activities."

This early part of this definition suggests that InfoSci is an adaptation of librarianship.

Michal Lorenz: "InfoSci is the study of the nature of information, its attributes and forces governing information flow for optimal accessibility and utilization.  InfoSci concerns both potential information[27] (recorded data) and psychophysical information (stored in a brain and processed in a consciousness)."

This definition suggests that InfoSci has significant overlap with cognitive psychology.

A further source of confusion is that, in German universities, what is called “Informatik” is what English speakers call “Computer Science.”

After seeing these definitions and many more, Zins provided the following:

Information Science is the study of the mediating perspectives of universal human knowledge (i.e., human knowledge in the universal domain).  The mediating perspectives include cognitive, social, and technological aspects and conditions, which facilitate the dissemination of human knowledge from the originator to the user.

A working definition adopted by the Information Science Abstracts periodical seems more workable than most of the prior examples:

An interdisciplinary field concerned with the theoretical and practical concepts, as well as the [technology], laws, and industry dealing with knowledge transfer and the sources, generation, organization, representation, processing, distribution, communication, and uses of information, as well as communications among users and their behavior as they seek to satisfy their information needs.

DDQ will assume this last definition whenever it refers to Information Science in the critique that follows in this and future DDQ numbers.[28]

Rename Information Science as Knowledge Science?

Zins suggests renaming Information Science to Knowledge Science.11  While the name chosen for a topic can be anything that does not create confusion with other topics, examining his rationale for renaming is interesting for what it reveals about contemporary thinking.

Notwithstanding this last viewpoint, a rhetorical aspect of the proposal seems interesting.  Often people seek grander sounding names as part of a political agenda, e.g., in campus politics to enhance prestige.  This seems to me to be the case for what English-speaking universities now call "Information Science."  In 1960, at the Univ. of Toronto, it was the "Library School".  Then it became "Library Science," and about a decade ago it became "Information Science".  And now we are urged to "Knowledge Science?"

Within his work, an author is free to choose any words and definitions he believes will help convey his message.  However, Zins goes further.  By basing his renaming recommendation on idiosyncratic definitions of ‘information’ and ‘knowledge’, he is asking some community to use these definitions in its everyday discourse, whether or not they conform to shared prior meanings.  Judging the renaming recommendation therefore requires examination of Zins’ definitions.

In several pages that include epistemological background,[29] Zins distinguishes subjective data, information, and knowledge from objective counterparts.  Since his discussion of subjective knowledge more or less conforms to what is written above, it can be passed over here, in favor of abbreviated inspection of the proffered definitions for the objective sphere.

The second approach ascribes an independent objective existence to knowledge.  Knowledge is the meaning, which is represented by expressed propositions.  It is true and exists independently of, not depending on, subjective knowledge of the individual knower. …

… in the objective domain, “data” are sets of symbols, which represent empirical stimuli or perceptions.  “Information” is a set of symbols, which represent empirical knowledge.  “Knowledge” is a set of symbols, which represent the meaning (or the content) of thoughts that the individual justifiably believes that they are true.

For me at least, several difficulties are evident, but not explained away by Zins.  A relatively minor difficulty is that these definitions of information and knowledge seem to make no reference to the cognate actions, informing and knowing.  Of more importance is apparent inconsistency: in the first paragraph, objective knowledge is said to exist independently of anybody’s subjective knowledge; in the second, determination whether or not something is objective knowledge depends on some individual’s subjective knowledge.

However, what most diverges from my understanding is that “Knowledge is a set of symbols which represent meaning.”  Had I attempted to define ‘knowledge’ (above I avoided doing so), I might have written something like “Knowledge is the meaning that might be represented by a set of symbols, and also the relationship between particular symbols and the concepts they represent.”

The renaming proposal depends directly on this unusual definition of objective knowledge, particularly on its independence from subjective knowledge, which Zins’ indicates is imperfect.  The arcane reasoning involved might be appreciated by disciplinary professionals, but is unlikely to be noticed by any much wider community, much less studied sufficiently to become understood.

What might laymen think is meant by “Knowledge Science?”  This label is likely to be confuse the general public about the distinctions from epistemology and normal psychology—a consequence not among Zins’ objectives.

Symptomatic Synopsis

Why is it reasonable to think that Information Science will wither?  Here are some indicators that future DDQ numbers will consider.

·         Most of the courses of a typical university Information Science Department have natural homes in other Departments.[30]

·         The novelty of today’s and tomorrow’s methods for information dissemination is merely a technology change.  Why have we never had a university Printing Science Department?

·         Topics such as ontologies and Web search methods, treated as novelties in the InfoSci literature, are taken for granted by the children of the authors.

·         A prominent current InfoSci topic is ontologies and ontological tools for other fields; however, historians (for example) neither want nor need help at organizing their topic.[31]

·         Fundamentals underlying InfoSci include epistemology; however, the InfoSci literature mostly ignores this background.

·         InfoSci practitioners seem to ignore the literature of Computer Science and of software engineering; software engineers pay little attention to the InfoSci literature.

·         Compared to Chemistry, Computer Science, and other disciplines represented by university departments, InfoSci is shallow; that is, in a certain sense. a competent practitioner can know all of InfoSci, a feat not possible for Chemistry.

·         Referees for the top scientific periodicals require that articles identify what is novel explicitly and also cite prior work that can help readers evaluate what is written.  Similar rigor is missing from InfoSci articles, and much of this literature is, in fact, devoid of new ideas and new observations.

The doubts suggested by these speculations are hinted at in InfoSci primary literature wondering about the role and future of the field.[32]  My concerns are not novel or unique, but seem to be shared by prominent InfoSci participants.  Such doubts have never been primary literature topics for the established sciences.[33]

Reading Recommendations

While vacationing in August, I read when my energy level was too low for much else.  Selections combined what a relative calls "bummer books" (mysteries, political reminiscences such as the memoirs of Louis Freeh, a retired FBI Director, and similar light stuff) and some weightier books.  Although I will not recommend the former (even though it is competent story-telling), I don't hesitate to recommend the latter books.  Each is self-contained in the sense that you can enjoy and learn from it without prior reading.

Umberto Eco, Kant and the Platypus: Essays on Language and Cognition

See a Wikipedia summary.  The reviews on an Amazon Webpage are excellent.

Eric Kandel, In Search of Memory: … a New Science of Mind

Again I find the reviews quoted by Amazon fit my impression.  One contains "... for anyone interested in the relationship between the mind and the brain, this is an important account of a creative and highly fruitful career."  The book is valuable for three reasons:

·         It is a remarkably readable account of scientific research whose significance cannot be overemphasized.

·         It presents some of the strongest evidence against dualism—the philosophical and religious doctrine that mind is a separate entity from body.

·         It includes a first-hand account of anti-Semitism in 1930’s Austria and recent Austrian thinking about that.  (I had previously thought that the Nazi horrors were a special German phenomenon.  Not so.  Austrians were particularly virulent and destructive.  Illusions of Austrian Gemütlichkeit and Schlampigkeit are just that—illusions that might have been fostered by propaganda.)

While I was looking into the last point, BBC World News[34] broadcast a squib about Mauthausen, a Nazi concentration camp in Austria, because extensive Holocaust records stored nearby had recently been made public and are being scanned for broad accessibility.  For something as bizarre as it is horrid, see the 1945 deathbed account of confession of Franz Ziereis, Mauthausen Concentration Camp Commandant, which starts with “I, myself, am not a wicked man ...”

Alan Hirschfeld: The Electric Life of Michael Faraday

This is another eminently readable biography accessible to scientific laymen.  It recounts nigh-incredible academic snobbery.  Faraday’s modest background made university education inaccessible in the 19th century.  He educated himself in scientific experimentation, and regretted that he was unable to extend this to mathematics.  As a consequence, he was treated with condescension by Oxbridge scholars, even after he achieved world-wide fame and Fellowship in the Royal Society.

Faraday nurtured the laboratory of his mind—a test bed of the imagination, churning with ideas on force, light, and matter. …  His drawerful of medals didn't matter to his critics; nor did the raft of less ornamental honors, or even his international fame.  He would always be the outsider, simultaneously praised and disparaged by university-trained practitioners—son of a blacksmith, and now himself a kind of "scientist-smith" in their eyes, indis­putably good at forging experiments, but ill-equipped to tackle the­oretical problems.                                   p.161

Hirschfeld’s Chapter 13, describing the James Clerk Maxwell’s collaboration with Faraday, shows scientific methodology at its best.  It expresses scientists’ philosophy about what is real with:

An electromagnetic field … is some fun­damental alteration of space wrought by embedded electric and magnetic sources.  Maxwell's equations do not reveal what an elec­tromagnetic field is, just how to compute its mathematical properties and how these properties give rise to observable phenomena.  In­spired by Faraday's geometrical musings, Maxwell created an elec­tromagnetic universe that cannot be effectively reduced to mental images. … Maxwell likened the situation to that of a bell ringer who tugs ropes that dangle through holes in the ceiling of the belfry; the bells themselves and their actuating mechanism remain a mys­tery.  Maxwell's contemporary, Heinrich Hertz, put it more bluntly: "Maxwell's theory is Maxwell's equations."  Or in the words of No­bel [laureate] Richard Feynman, "Today, we understand better that what counts are the equa­tions themselves and not the model used to get them.  We may only question whether the equations are true or false.  This is answered by experiments, and untold numbers of experiments have confirmed Maxwell's equations [as complete and accurate.]  If we take away the scaffolding he used to build it, we find that Maxwell's beautiful edifice stands on its own."          p.190

Andrew Spielman & Michael D'Antonio, Mosquito

Like the previous two books, Mosquito: a Natural History of Our Most Persistent and Deadly Foe is a self-contained, scientifically accurate account presented without condescension for its intended lay audience.  It describes the epidemiological history of yellow fever, dengue, malaria, and West Nile virus, and their effects on world history.  These effects include the brief success of DDT.  The book explains why malaria will probably never be eliminated.

Practical Matters

One can hardly open a news or trade magazine without encountering prominent excitement about one or another person-to-person Internet service.  Every reader has probably received several invitations to join, often forwarded by his business and personal acquaintances.

DDQ recommends strongly against enrolling in such services because they represent untapped opportunities for misbehavior—identity theft, fraud, and other invasions of privacy.  Articles about such violations are likely to become, in a year or two, as prominent as articles about spam and credit card theft are today.[35]  If a person-to-person service tempts you, consider waiting for two of years to see how problems and protections develop.

Migrating from Microsoft Products to Open Source

Like many PC users, I have ideological and economic reasons for discontent about lock-in to Microsoft products.  As must be common, I became locked in because Microsoft was first to market with key software in areas that require a great deal of user training and because its near-monopoly makes it unattractive for application vendors to enable for Linux machines.

I am working to extract myself, cautiously to avoid impacting day-to-day productivity.  In this and future numbers, DDQ will report progress and suggest tools for readers who want freedom.[36]  Consider the following technologies and user guides:

Ubuntu Linux

has emerged as the leader for desktop Linux because it is so convenient to install and tailor.  I like it much better than Linux distributions I tried in 2006.

Linux - You Can Do It!

is a good PC Magazine introduction for anyone considering migration.

Ubuntu Linux vs. Windows Vista

compares Ubuntu 7.04 (codenamed "Feisty Fawn") with MS Windows Vista, looking at the functionality of the operating system and several key applications.

WinSCP

is an open source free SFTP client for Windows.  Its main function is safe copying of files between computers.  I have found it reliable and convenient for synchronizing Windows and Linux directory contents.

Taking the good stuff when trading Windows for Linux

is a Computerworld article discussing alternatives for moving from Windows to Linux.  Especially interesting is an Ubuntu Linux installation option that automatically migrates files from a Windows system installed on the same machine.

innotek VirtualBox[37]

is a family of x86 PC virtualization products whose open-source version is free for home use.  It runs on Windows, Linux, and Macintosh hosts and supports guest operating systems that include Windows XP and Linux.

The applications that most impede my abandoning Windows/XP are the workhorse office applications and my optical character recognition (OCR) tool.  The problem with the office suite is not that MS Office is superior to free, open-source OpenOffice—it isn’t.  Instead the inhibitor is that I have a large sunk investment in learning how to use MS Office efficiently and effectively for what amounts to over half my working hours.  The main challenge is in text processing.  Since I use only the most basic spreadsheet, presentation, and graphics functionality, switching between the MS and open-source versions is easy and convenient.  Converting files between the two platforms is trouble free; in particular, OpenOffice components readily import and export their MS-equivalent files.  However, the presentation and handling of a text file by OpenOffice Writer is different from that by MS Word in a large number of small details—so many that my productivity is severely hampered.

Microsoft competitors and open-source contributors are working hard to address their competitive disadvantage.  For instance, Sun Microsystems recently announced free MS Office plugins for handling the ISO-standard Open Document Format.  IBM just announced that it will invest in improving OpenOffice technology.  However, critics suggest that what is being done is not enough.

For OCR, I am wedded to Omnipage because it seems to have a lower error rate than any Linux offering.

Persistent Storage on Your LAN

Best prices for magnetic disk storage have changed little since they were reported in DDQ 6(2).  Among emerging storage servers that can be shared among PCs, the most interesting is DROBO, a USB-attachable with Serial ATA drive bays for up to four drives as large as 1 terabyte each.

DROBO’s ease of use makes it especially suitable for customers without expertise in RAID storage.  The unit employs its own virtualization algorithms to provide automatic data redundancy in a multi-drive array.  With its own operating system and processor, DROBO allows you to swap a drive out without disrupting service.

Competitive offerings include Seagate’s Maxtor One Touch Plus and various Iomega StorCenter devices with RAID 5 data protection.

An emerging option for laptop computers is hybrid storage units which include a large buffer so that the HDD platters rest most of the time, instead of constantly spinning.

DDQ recommends that readers unwilling to pay early-adopter prices wait and watch for about a year as competition develops.[38]  For instance, my home network already includes external drives purchased about a year ago and having sufficient capacity.  The setup requires me to kick off backup from time to time; however this takes just a few mouse clicks—not so burdensome that I’ve bothered to configure a timer-driven background process.  I’ll wait until the prices for today’s more sophisticated devices drop.

Your Future PC

Your PC in 2008 and Beyond, suggests that blindingly fast chips and other innovations will change everything about home, office, and mobile computing.  The PC World article also tabulates more distant technologies, hot products, and marketplace failures.  DDQ is more reserved, as follows:

Technology described by PC World

DDQ comment

Plugless power by pads on which you place your PC

Uninteresting, because the plugged power cord is replaced by a bigger and more expensive gadget.

Printers built into digital cameras

Will have marginal success similar to that of Polaroid cameras 40 years ago.

Graphic processing integrated onto CPU chips

Will much accelerate graphics-intensive computing, but at early-adopter prices until at least 2010.

Flexible liquid crystal displays

Flexible color displays will not be offered at attractive prices before 2011.

High-speed wireless telephony using packet switching

The WiMax technology was standardized some time ago, but seems to be perpetually “available next year”.

Eight CPUs on a single chip

To exploit parallelism beyond dual processors requires complex operating system and application software changes, beyond what PC markets will sustain before 2012.

Cable-free television anywhere

Wireless high-definition television might be inexpensive by 2010.

Five terabyte hard disk drives

HDD capacity and price improvements have improved following the exponential curve steadily for a decade; extrapolation to the next five years can be confidently predicted.

Expansion of Internet addressing to allow every device to be accessed

32-bit addressing in the IPv6 standard has been available for several years, but not widely adopted because ingenious tricks have extended 16-bit IPv4 addressing at low costs; we will probably continue with 16 bits painlessly for at least three more years.

Touch screen PCs in every surface

This technology is of little interest for home users, but might see significant business customer uptake in four years.

Relief of current PC system bus bottlenecking

The system bus has become a significant source of PC performance limits; devices supporting the PCIe standard will provide relief in 2011.

Tiny video projectors

Cell phone video is painful; look for laser-based projectors in pocket devices in 2010.

 



[1]     DDQ does not reproduce the cartoon drawing because doing so might be a copyright infringement.

[2]     The pertinent text of HJ Res 20, signed into law on 2/15/2007, is: “(3) Of the unobligated balances available under the heading `Library of Congress, Salaries and Expenses', the following amounts are rescinded: (A) Of the unobligated balances available for the [NDIIPP], $47,000,000.”  The full text of HJ Res 20 is available via www.loc.gov/thomas.

[3]     See H.M. Gladney, Digital Preservation in a National Context: Questions and Views of an NDIIPP Outsider, D-Lib Magazine 13(1/2), January 2007.  This article already suggests interesting digital preservation challenges not receiving the attention they merit.  See also DDQ 1(3), Selection Criteria: What’s Worth Saving?

[4]     At least, interest is not evidenced by substantial practical activity or publication.  For instance, see Seamus Ross and Andrew McHugh, The Role of Evidence in Establishing Trust in Repositories, D-Lib Magazine 12(7/8), July 2006.  The 43 citations in this article include no references to preservation work other than from the cultural heritage community in universities and public institutions.

[5]     A longitudinal patient record is a lifetime collection of the medical records of a human being, held to be readily available.

[6]     Gordon Bell and Jim Gemmell, A Digital Life, Scientific American 296(3), 58-65, March 2007.

[7]     “Overloaded” used to modify a denotation means “having more than one meaning.”  I.e., “overloaded” can be a technical term synonymous to “ambiguous”.  This unusual usage is chosen for emphasis.

[8]     I do not know whether “logic” was used in the everyday lay conversation.

[9]     George Boole, The Mathematical Analysis of Logic, 1847.

[10]    The 1942 German title is Zur Logik der Kulturwissenschaften: Fünf Studien.

[11]    Chaim Zins, Redefining Information Science: From Information Science to Knowledge Science. J. Documentation 62(4), 2006.  I have discussed this paper with Zins in an e-mail exchange.

[12]    I checked this assertion by inspecting the indices of  Cassirer’s The Problem of Knowledge vol.4,  Polanyi’s Personal Knowledge, Ryle’s The Concept of Mind, Coffa’s The Semantic Tradition, Kandel’s In Search of Memory, and Sowa’s Knowledge Representation.

[13]    For instance, see Gilbert Ryle, The Concept of the Mind, Chicago U.P. 1949, pp. 226-9.

[14]    McElroy 2004, How to Operationalize KM, suggests a distinction between first generation KM and second generation KM.  It starts with the view that there is a difference between producing and integrating information in business—‘Knowledge Processing’ (KP) and systematic attempts to enhance processes by improving people’s knowledge.  Unlike KP, second generation KM addresses knowledge production, which it must do if it is to address first-generation KM’s failure to distinguish between information [and] knowledge. (McElroy’s italics)

      See also Hermann Maurer and Klaus.Tochtermann, On a New Powerful Model for Knowledge Management and its Applications, J. Universal Computer Science 8(1), 85-96, 2002.

      For a selection of KM books, see a University of Florida Business Library listing.

[15]    The CFP for a 2006 Knowledge Organization Systems and Services conference includes, “Knowledge Organization Systems (KOS), such as classifications, gazetteers, lexical databases, ontologies, taxonomies and thesauri, attempt to model the underlying semantic structure of a domain for the purposes of retrieval.  Traditionally the focus has been on construction of print-based resources.  Possibilities for networked KOS-based services are emerging but pose new challenges in today's complex, interdisciplinary knowledge domains.”

[16]    Gregoris Mentzas, Kostas Kafentzis, and Panos Georgolios, Knowledge Services on the Semantic Web, Comm. ACM 50(10), 53-8, 2007.

      Torgeir Dingsør, Hans Karim Djarraya, and Emil Røyrvik, Practical Knowledge Management Tool Use in a Software Consulting Company, Comm. ACM 48(12), 96-100, 2005.

[17]    For naming objects and concepts, I want a very strong justification before I will accept as helpful a change from what I believe to have been common meaning and usage.  This has similarities for the conventional demand for very strong reasons for replacing a scientific paradigm with a disruptive successor, as discussed by Thomas Kuhn and also by Ernst Cassirer.

[18]    When William James wrote his seminal The Principles of Psychology in 1890, the fields of (normal) psychology and epistemology were not distinct.  See Gerald E. Myers, William James:His Life and Thought, Yale U.P., 1986.

[19]    For instance, I can inform you that, “Newton knew that bodies attract each other.”  If you believe me, you will know this fact and that my sentence informed you about Newton’s (apparent) state of mind.

[20]    Any author is free to choose whatever definitions he wants.  As ever, DDQ attempts conformance to Martin Gardner’s precept.

[21]    Each reason for concern about the health of InfoSci has to do with some attribute of a science—an attribute that should be evident, but is either missing or obscure.  A difficulty with what DDQ will discuss is that I might have failed to discover significant activities and literature that call parts of my critique into question.  (My related activity is limited to digital library technology, long-term digital preservation, and such aspects of search technology that I encounter as a user.)  I welcome critical arguments, and will be happy to publish or cite in DDQ any criticisms whose authors wish such airing.

      The critique will use as models sciences with which I am intimately familiar.  My formal education was in Chemical Physics.

[22]    This is evident in the topics chosen for DDQ, and in its citations.

[23]    A few computer scientists considered the topic long enough to show how digital information could be reliably preserved forever.  Their seminar attendees lost interest as soon as they understood that every apparent technical challenge had a practical solution.

      The software engineering community has not yet taken up the topic to produce viable and convenient tools.  Perhaps this is because, apart from academic librarians and archivists, almost nobody has expressed interest.  And the librarian/archivist community shows little interest in potential solutions that come from outside its own ranks.  (See DDQ on The Two Cultures).

      A way of preserving any digital object to be reliably useful and trustworthy forever is described in H.M. Gladney, Trustworthy 100-Year Digital Objects: Evidence After Every Witness is Dead, ACM Trans. Office Information Systems 22(3), 406-436, July 2004, and H.M. Gladney and R.A. Lorie, Trustworthy 100-Year Digital Objects: R.A. Lorie, Durable Encoding for When It's Too Late to Ask, ACM Trans. Office Information Systems 23(3), 299-324, July 2005.

[24]    See DDQ 5(4), Inattention across Disciplinary Boundaries.  Even within a discipline, antipathies interfere; see David Kaiser, When Fields Collide, Scientific American 296(6), 62-69, June 2007.

[25]    Chaim Zins, Knowledge Map of Information Science, J. Am. Soc. for Information Science (JASIST), 58(4), 526-535, 2007.

      Chaim Zins, Conceptual approaches for defining "Data", "Information", and "Knowledge"JASIST 58(4), 479-493, 2007.

      Chaim Zins, Classification schemes of Information Science: Twenty-eight scholars map the field, JASIST 58(5), 645-672, 2007. 

      Chaim Zins, Knowledge Mapping: an Epistemological Perspective, Knowledge Organization 31(1), 49-54, 2004.

[26]    Readers can see the fifty unedited definitions in Chaim Zins, Conceptions of Information Science, JASIST 58(3), 335-350, 2007.

[27]    Potential information is knowledge recorded on material media which could be communicated on dispensable devices.  Data are specific kinds of potential information which circulate in machines (derived from J. Cejpek, 1998).

[28]    As is common in scholarly literature, this is only a technical definition provided to mitigate potential misunderstanding, and not an assertion that this definition is either true to common usage or better than alternatives in any specific sense.

[29]    Zins’ citations are telling, being mostly recent writings from the Information Science community.  Among widely recognized philosophers who created epistemology, the only citations later than Kant’s Critique of Pure Reason are to K.R. Popper’s works, which I judge to be second-rate relative to those of scholars cited above (Cassirer, James, Kandel, Polanyi, Ryle, Wittgenstein) and in Coffa’s The Semantic Tradition (Carnap, Quine, and members of the Vienna Circle.)

[30]    Consider the courses at your favorite Information Science Department.  For instance, you can see University of California at Berkeley course descriptions online.

[31]    Historians do, of course, use digital tools for organizing their work, but these tools are mostly provided by software engineers, not by information scientists.

[32]    For instance, the online Knowledge Map of Information Science: Conceptions, quotes Herold, “Although [InfoSci] emerged in the twentieth century with great force and seeming novelty, its growth as an intellectual discipline has been tentative and the enterprise shows much immaturity."

[33]    Publications about scientific society affairs, such as Physics Today, do have articles speculating about the future of their fields, but these articles are about choices of emphasis and matters of training, not about the existence and value of their fields.

[34]    On Thursday, October 04, 2007.

[35]    A few such articles have appeared in the last three months.  See, for instance, Tom N. Jagactic, Nathanial A. Johnson, Markus Jakobson, and Filippo Menczer, Social Phishing, Comm. ACM 50(10), 94-100, 2007.

[36]    Doc Searls, On Valuing Freedom More Than Cushy Jail Cells, Linux Journal, August 2007.

[37]    I have not yet exploited VirtualBox because I run Linux and Windows on side-by-side LAN-connected machines.

[38]    For differential pricing theory see Hal R. Varian, Differential Pricing and Efficiency, First Monday 1(2), August 1996.