Digital Document Quarterly

Perspectives on Trustworthy Information

Volume 5, Number 1, 1Q2006
DDQ Home

 

HMG Consulting

Saratoga, CA 95070

©  2006, H.M. Gladney

 

ISSN: 1547-8610

Digital Preservation

Computer-Aided Manufacturing

A National Institute of Standards and Technology (NIST) research group has started to consider making CAD (Computer-Aided Design) and CAM (Computer-Aided Manufacturing) records endure usefully for at least as long as the artifacts for which they were created.[1]  As part of that, it organized and hosted a workshop of about 40 potential participants that represented almost as many different organizations.  The overview of and presentation materials from that workshop are available online.

What differentiates CAD/CAM from most other topical areas under consideration for preservation is that its content files and application programs are among the most complex cases deserving attention.  In fact, the challenges start with a problem that logically precedes digital preservation: it is difficult to convert files from an application that uses one geometrical representation (e.g., by a set of 3-D objects) to another representation (e.g., by parametric surface functions) as might be needed to progress from engineering design to automated machine tool command files.  This is illustrated in Doug Cheney’s presentation.

Focused Initiatives in Europe

We are generating material faster than we are taking care of it, without thought for the long term value.  …  Researchers are now becoming concerned about data management and are beginning to realise the value and need for personal archiving, reinvention and replication.  There is a lack of tools and education--for both professionals and researchers—coupled with a lack of review mechanisms for scientific and other digital archives.  Academic literacy is changing and there is a growing democratization of the publication process.  More requirements will be made of data from scholarly publishing.              2005 Warwick Workshop on Digital Curation and Preservation, page 10

As part of its “How to get there” tabulation, this Warwick workshop recommends, “Work in partnership with commercial system providers and with key interested parties such as CERN and others, on error levels and developing affordable scalability.”  However, it offers few suggestions how this is to be achieved.  Commercial system vendors have made rapid progress over several decades without much information science contribution, and seem likely to continue to do so for the foreseeable future.  Successful partnership requires benefit to every involved party.  Nobody has suggested specific help or expertise that the information science community and public sector repositories can offer to attract commercial partners.[2]  Nor is it apparent what these might be.

Access to the Records of Science

The European Task Force for Permanent Access to the Records of Science has issued its report, Strategic Action Programme 2006-2010Its Research Agenda proposes six main themes:

1.    Developing and deploying technical tools to support different preservation strategies;

2.    Registries of representation information should be developed;

3.    Managing complex dynamic datasets and databases needs to be investigated;

4.    Developing distributed archives and network solutions from a preservation perspective;

5.    Devising new approaches to IT solutions from the perspective of the durability of information; and

6.    Developing insight into life-cycle costs to support sustainable long preservation and access.

In addition to the carefully considered R&D challenges, the Task Force has identified other challenges needing attention, including economic challenges, access and rights management, infrastructure and strategy.

European Digital Library

The EC announced plans for European Digital Library to promote access to Europe's heritage with about 6 million on-line cultural works based on a Europe-wide digitization network.  This followed publication of an overview of the results of a survey on digital libraries, with 225 replies from libraries, archives and museums (46%), publishers and right holders (19%) and universities/academics (14%).  Two million cultural works are to be accessible by 2008, growing to at least six million by 2010.

The EC also identified the membership of an EC High Level Expert Group on Digital Libraries.

Preservation of Material Records

For a balanced view of preservation challenges, awareness of the situation for traditional resources is essential.  The [U.K.] National Preservation Office has issued a survey report, Knowing the need: a report on the emerging picture of preservation need in libraries and archives in the UK.  The findings include that:

Ø  Significant amounts of unique or nationally important material are at risk because of poor preservation practice;

Ø  There is a lack of environmental monitoring and control especially in libraries;

Ø  13% of material surveyed is actively deteriorating or will be damaged if used;

Ø  50% of material is stored in inadequate accommodation.

Preservation by Emulation in the Netherlands

The Koninklijke Bibliotheek and the Nationaal Archief of the Netherlands have started a 2-year development project for an open source modular emulator[3] based on Lorie’s UVC (Universal Virtual Machine), which is described in DDQ1(4) and more formal publications.[4]   A key objective is to demonstrate that emulation can be cost-effective for long-term digital preservation as the core of a practical tool.

Recently a variant encoding procedure has appeared, targeted for saving compressed files for as long as CPUs implementing the Intel™ x86 instruction set are available, which is likely to be for several decades.[5]  Both to achieve high performance and also to enable reuse of existing compression and decompression software, this procedure replaces Lorie’s UVC[6] with a virtual machine that implements an X86 architectural subset.  This subset omits all input/output and all operating system calls and triggers so that it is insensitive to specifics of the environment, making it similar in this respect to the UVC definition.  Its execution environment is close to what the Figure suggests.[7]

“Progress” in the NDIIPP Projects

The Library of Congress reports on a January 2006 NDIIPP status meeting, “What if NDIIPP knew what NDIIPP knows?” in which William LeFurgy reminded the audience that “we are halfway through NDIIPP.”  DDQ suggests that readers inspect the report looking for insights that were not available before NDIIPP began.  They are likely to be disappointed.  I would be grateful to anyone who draws to my attention to any truly new idea identified in this report.

The eight project consortia represented were funded with $14M of Americans’ tax dollars.  They might have used this for collection development; we don’t know, as it is not mentioned in the report.  Instead, we find emphasis on such things as a $3M expenditure on “Portico’s Experience in Building Partnerships.”  Clay Shirky, a paid NDIIPP advisor, re-emphasizes this hackneyed theme with “What is important for NDIIPP and its many partners is the quality of the “social networking” that the program catalyzes.”  This echoes many research library reports we have seen over the last decade—reports on librarians building their own sense of community.  Surely the participants have long since accomplished this and could move on to achieving something objectively assessable and transferable to outsiders?  DDQ asks, does NDIIPP management not understand the difference between process and progress?

In connection with an incomplete project that DDQ will mention later in 2006, I selected for citation over 400 digital library and digital preservation articles and books from roughly 1000 candidates.  About half of these were written by librarians, archivists, or their academic faculty cousins, information scientists.  I found it very difficult to locate new ideas in these readings.  This is surely partly a matter of style.  In my experience, referees for the top physical science, engineering, and computer science journals would surely reject a proposed article that does not explicitly identify its new teaching, because they do not want to impose a guessing game on readers.  Furthermore, such referees demand that reported progress be objectively assessable.  Such rigor is apparently not required of articles in the information science literature. 

Instead of what I am accustomed to finding in the scientific and engineering literature, a surprising number of information science articles report surveys that merely confirm widely held opinions.  The people surveyed probably know less about the topic at hand, and have thought less about it, than the authors of the surveys.  How can you learn much from that?

Such articles frequently move to normative conclusions: “If only people acted in thus and such a manner, things would be much better!”  Part of the problem is repeated failure to distinguish between novel research and didactic moralizing.

Shortfalls of the digital preservation literature are further illustrated by Eric Morgan’s report of the March 2006 symposium, Scholarship and Libraries in Transition: … Impacts of Mass Digitization Projects.  It summarizes what were probably among the best talks with

“… mass digitization allows libraries to rethink the role of physical space, but more importantly, it allows libraries to rethink what libraries do regarding collections.  Both of them alluded to the possibilities of enhanced, value-added services against collections of electronic texts, but when pressed for elaborations none were forthcoming.”

This is hardly helpful.  Would any DDQ reader like to identify specific service opportunities?

DDQ invites readers who believe its criticisms inappropriate to voice their objections publicly.[8]

Why So Slow Towards Practical Preservation?

“Where stone tablets could be expected to survive for tens of thousands of years, a floppy disk or magnetic tape may only last 10 years.  The hardware and software required to perceive or experience the information will be lucky to survive even that long.” [9]

"Historians will look back on this era and see a period of very little information.  A 'digital gap' will span from the beginning of the wide-spread use of the computer until the time we eventually solve this problem.  What we're all trying to do is to shorten that gap."  (Danny Hillis, Disney Chief of Research and Development)[10]

It is estimated that we have created and stored since 1945 one hundred times as much information as we did in all of human history up until that time!” [11]

Since the challenges were articulated in 1996,[12] many conferences have been held and many papers have been written on the topic.  They include reminders of urgency, because irreplaceable and valuable digital content is allegedly disappearing.[13]  The causes of preservation problem for digital information are seen as “complex and largely uncontrollable.  Preserving books and other cultural objects looks straightforward in comparison.”[14]  From librarians’ perspectives the challenges include: inability to determine where to start, lack of sufficient expertise, absence of easily obtainable and trusted tools, and unrealistic expectations about costs.[15]  How much this and similar problems can be mitigated by librarian training is unclear.[16]

Is the problem in fact urgent?  Is progress in fact slow?  An eminent librarian once reminded me that “urgent” and “slow” have different meanings within the Washington beltway than they do to denizens of Silicon Valley.

Is it that the responsible managers believe that prompt action would risk massive wasted effort because unsolved technical problems exist for some kinds of data?  If so, they should tell the software engineering community specifically what these risks are and which data classes are affected.  Alternatively, if non-technical risks are the effective impediments, they should be specifically articulated for consideration by the best minds available.

The information science literature about digital preservation pays less attention to economic factors and technical trends than to examining how current repository methods can be adapted to a digital world.  It pays less attention to the content to be preserved than to institutions that might perpetuate current infrastructure.[17]

This emphasis reminds me of residential milk delivery in 1940’s small towns.  Our deliveryman ran to enter kitchen doors from his horse-drawn enclosed wagon.  The horse moved from appointed spot to appointed spot without any commands.  Of course, to minimize distractions, the horse wore blinders.

Now, blinders are quite appropriate for routine work that should follow a pattern and route that are agreed upon and have been practiced for some time.  However, the current situation for digital repositories in cultural heritage institutions is described as ill-understood and requiring investigation of new routes.[18]  Opportunities to speed progress towards efficient and routine institutional procedures will be found if the blinders are taken off!

DDQ recommends attention to three topics:

Ø  Structure and format of each sharable content package, and its critical dependencies on other packages;

Ø  Methods of estimating costs (which cannot be done without building a good process model); and

Ø  Design of institutional and client work-flow to manage very large numbers of documents and records.

Speculation About Faster Progress

As part of volunteering at the Computer History Museum, I have been mulling about how an institution might best think through the choice of a comprehensive digital content management package.  This activity currently has two threads: constructing a repository requirements analysis document helpful to a museum institution, and inspecting public information about open source content management packages, including critical review literature.

The repository requirements analysis addresses many of the same topics as the nascent RLG Audit Checklist for the Certification of Trusted Digital RepositoriesIt is different in that it focuses on museums and that it attempts, as far as is possible to:

Ø  Replace subjective statements with objectively testable criteria, and to identify subjective statements as such;

Ø  Distinguish between managers’ statements of objectives and specifications for testing/checking reliability; and

Ø  Include sections that are like an aeroplane pilot’s checklist for preparing for flight.

For each of the following sources of potential unreliability,[19] such a document should identify specific pertinent examples and their remedies, including tests for imminent failure.

Generic risk

Examples

Media and Hardware Failures

Failure causes include random bit errors and recording track blemishes, breakdown of embedded electronic components, burn-out, and misplaced off-line HDDs, DVDs, and tapes.

Software Failure

All practical software has design and implementation bugs that might distort communicated data.

Communication Channel Errors

Failures include detected errors (IP packet error probability of ~10-7) and undetected errors (at a bit rate of ~10-10), and also network deliveries that do not complete within a specified time interval.

Network Service Failures

Accessibility to information might be lost from failures in name resolution, misplaced directories, and administrative lapses.

Component Obsolescence

Before media and hardware components fail they might become incompatible with other system components, possibly within a decade of being introduced.  Software might fail because of format obsolescence that prevents information decoding and rendering within a decade.

Operator Errors

Operator actions in handling any system component might introduce irrecov­erable errors, particularly at times of stress during execution of system recovery tasks.

Natural Disasters

Floods, fires, and earthquakes.

External Attacks

Deliberate information destruction or corruption by network attacks, terrorism, or war.

Internal Attacks

Misfeasance by employees and other insiders for fraud, revenge, or malicious amusement.

Economic and Organization Failures

A repository institution might become unable to afford the running costs of repositories, or might vanish entirely, perhaps through bankruptcy, or mission change so that preserved information suddenly is of no value, or so that destroying preserved information mitigates legal risks.  

Reasons for the slow progress toward practical digital preservation systems include insufficient partitioning into requirements and system components that can be addressed with only small interdependencies, and failure to distinguish between aspects that have for approximately a decade been handled well by extant software offerings (e.g., IBM Content Manager—obviously a biased example, given my personal history) and additions needed to such well-understood technology.

This is significant partly because much of the current work parading as contributions to digital preservation wastefully reproduces what can be acquired at less expense than the likely development costs.  (And our tax dollars are supporting such waste!)

News and Innovation

The British Library announced that starting in March 2006 it was partnering with Google to offer researchers, students and academics desktop delivery of millions of full text scholarly research articles, via Google Scholar links to the British Library's document delivery service, British Library Direct.

'Digital Divide' Rapidly Closing

According to a just-released Pew report, the disparity in Internet usage between whites and ethnic minorities is diminishing, closing the much-feared 'digital divide' in America.  A summary of the findings of this survey of American adults suggests that 74 percent of whites use the Internet, 61 percent of African-Americans and 80 percent of English-speaking Hispanics do.  According to the New York Times, a similar 1998 Pew report found that 42 percent of whites used the Internet, while only 23 percent of blacks did so.  The greatest growth in Internet access and use was by young people, according to the report.

The report includes many quantitative observations, making it possible for each reader to form his/her own individual (subjective) interpretation.  While the surveyors’ interpretations are interesting, it is the detailed numbers that make the report credible and therefore valuable.

How important reporting the numbers, rather than only interpretations, can be was brought home to me about 15 years ago in a “flaming liberal” interpretation of IBM Research professional performance appraisals.  The numbers, based on too small a sample to justify the conclusions emphasized by the consultants, showed that Asian-Americans had received somewhat higher appraisals than Caucasians, and that other minorities (mostly Spanish-Americans and African-Americans) had received somewhat lower appraisals.  The consultant’s conclusion: the other minorities were not receiving enough coaching or other help, whereas Asian-American staff members were not being promoted as quickly as they deserved!

Search

Finding information is easier than ever before.  Google’s business success seems to have stimulated renewed interest in search technology in many organizations, including university computer science departments.  This, together with the fact of a huge reservoir of articles about information retrieval,[20] leads us to expect rapid and significant improvements to today’s already excellent tools for information retrieval.  I will try to puzzle out what improvements to expect, and publish this speculation in DDQ if it seems plausible.  However, if you find the topic amusing, you might on your own watch for things of special value to your interests.

Just as DDQ 5(1) was being readied for publication, the April 2006 number of the Communications of the ACM arrived in my snail-mail.  It features articles on Supporting Exploratory Search.

Orphan Works

Current US Copyright law regarding reproduction by libraries and archives hampers software preservation projects, such as those within the Computer History Museum (CHM), creating risks associated with orphan works.[21]   A U.S. Copyright Office report on orphan works recommends to Congress that the law be changed.  However, notwithstanding some Congressional interest in the report, the prospects for prompt action are remote. (Existing efforts, such as the so-called Preservation of Orphan Works Act have not been successful.)

The copyrights on some software that the CHM would like to preserve are probably owned by corporations that have acquired business units from other corporations, after other similar changes over the years.  If the software is not part of a currently viable offering, it is unlikely that the copyright owner in fact knows of its ownership rights.

A practical tactic is to trace the history of organizations that might have owned the copyright at issue, and to persuade each of these to grant to a museum whatever rights it might hold, without anyone ever determining who the actual owner is.  However, even this tactic costs too much effort for more than a small number of especially interesting software packages.

Video Camera Helps Organize Paper Documents

In 1987, I participated in what I believe was the very last of several “paperless office” task forces mounted within IBM Research.  In a recognition that paper is here to stay, a University of Washington team has designed a video-camera-based system to track physical documents on a desk and automatically link them to appropriate electronic documents.  The researchers have constructed a pair of prototypes that track paper documents and sort photos without the use of special tags, paper or marks.[22]

The paper-tracking system allows users to pinpoint the location of a given document within a stack of documents on the desktop.  They can find a document using keywords, document appearance, or by how recently a paper was moved.  A user can ask, for example, "Where is my W-2 form?"  The photo-sorting application allows users to sort digital photographs using printouts of the photos.

The system infers the structure of a stack of papers from video images as a user moves papers from stack to stack.  The system parses the video into individual movements and then interprets each event to determine how the documents were reorganized.  It can begin with a desk that is already full of documents; it will gradually index documents as a person moves them. 

Piccolo Toolkit for Graphical Application Development

The Univ. of Maryland HCI Laboratory has released Piccolo, a toolkit for 2D structured graphics user interfaces.[23]  It is intended to help Java and C# developers to build graphical applications for different platforms.  Piccolo uses the "scene-graph" model often found in 3D environments.  Versions include PiccoloJava, built on Java2; Piccolo.NET, built on the .NET Framework; and PocketPiccolo.NET, built on the .NET Compact Framework, for rendering applications for PocketPCs and Smartphones.

The infrastructure provides efficient repainting of the screen, bounds management, event handling and dispatch, picking (determining which visual object the mouse is over), animation, layout, and more.  It also provides for Zoomable User Interfaces.  A ZUI presents a huge canvas of information on a small display surface by letting the user smoothly zoom in, to get more detailed information, and zoom out for an overview.  Using a hierarchal structure of objects and cameras, it allows the application developer to orient, group and manipulate objects in meaningful ways.

OfficeObjects Intelligent Content Manager

Inspection of the specifications of an offering from Poland, the OfficeObjects® Intelligent Content Manager, suggests that some DDQ readers would find it of interest.  It is a product to assist in creating content and knowledge management applications.  It can be used as a platform for creating an information portal or as an integral part of systems requiring advanced services for managing documentary information collections.

This package emphasizes the perspective of a bureaucratic user community with complex workflow and with access to several repositories into which a user might need to insert parts of an object or retrieve related objects.  A high-level architecture and a conference presentation are available.

Reading Recommendations

It is time to unmask the computing community as a Secret Society for the Creation and Preservation of Artificial Complexity.                                                                                                    Edsger Dijkstra, 1997.

John von Neumann, The Computer and the Brain

An invited lecture series[24] represents the views of one of the greatest mathematicians of the twentieth century on the analogies between computing machines and the living human brain.  John von Neumann concludes that the brain operates in part digitally, in part analogically, but uses a peculiar statistical language unlike that of man-made computers.  Written for laymen, half a century later its ideas have been extended, but not supplanted.

Freeman Dyson, The Darwinian Interlude

Anything written by Freeman Dyson is likely to be profound and worthy of attention by anyone interested in a topic on which he chooses to comment.  When he comments in glowing terms about the speculative work of another scientist, his comments deserve widespread attention.  That is the case for the reproduction of his March 2005 comments on the work of Carl Woese,

Copyright Tutorial in Comic Book Form: Bound by Law

Duke University's Law School has just published Bound by Law, a comic book designed to teach copyright law and fair use.[25]  It uses misconceptions about documentary film making to elucidate fair use doctrine generally.  A goal was to produce a high quality resource for classes from high school to graduate school, from film school to social studies, as well as for the general public.

Andrew Roberts, Salisbury: Victorian Titan

Concern for technology impairing authenticity is hardly new.  Salisbury, at the time Prime Minister of the United Kingdom, was an enthusiastic tinkerer.

Salisbury quickly appreciated the political implications an increase in the speed of communications might have.  When he was staying at the home of one of his MPs, Sir William Forwood, in February 1893 …, the house was connected to the chamber of the Commons.  'I can hear someone talking about Uganda,' Salisbury announced delightedly.  He later told his host: 'I hate political functions; but this was a very different occasion.  It was one of the most interesting twenty-four hours I have passed.'  He did not trust the telephone alto­gether, however, and told the Queen's assistant private secretary that he disapproved of it as a medium for transacting official business, 'as there was nothing to vouch for its genuineness'.[26]                    Roberts, Salisbury , p. 112

As Foreign Secretary, Robert Cecil, Lord Salisbury dominated the 1878 negotiations that carved the Balkan states from the waning Turkish empire.  In separate, almost private communications with Andrássy, the Austrian Foreign Minister, with Bismark, the German Chancellor, with Schouvalkoff, the Russian Ambassador to Great Britain, and with the by-then toothless Turkish government, Salisbury specified the Balkan boundaries and other terms to end the Russian attack on Turkey.

“Although the details of the settlement, especially the exact borders of the two new Bulgarian states, had to be fine-tuned over four weeks of tough but generally good-humoured negotiations, the outlines were all already there.  The Congress was an opportunity for the men who would be controlling the destiny of Europe, and increasingly of Africa and Asia as well, to meet and entertain one another incessantly.  …

“The issues at Berlin were discussed in descending order of difficulty, with Bulgaria coming first. Highly detailed negotiations requiring close knowledge of the frontiers, rivers, populations and especially the defen­sive mountain ranges saw the well-briefed Salisbury, ably advised by a fortifications expert General Simmons, winning important points off the Russians.  It was at this stage that the laziness and ignorance of some other negotiators came to the surface. When Bismarck suggested that in the southern Bulgarian province the Sultan should only employ Christian troops, thus allowing the Tsar to retain his favoured soubri­quet of 'Liberator of the Balkans', Salisbury's response was that it was an admirable idea, only … the Sultan did not have any, as Christians were excluded from military service in the Ottoman Empire.

“On 17th June the Congress got down to the meat of the Bulgarian ques­tion, with Salisbury proposing that Bulgaria be split into an autonomous principality north of the Balkan mountain range, whilst the territory south of it would be called the province of Eastern Roumelia and left under the military and political control of the Sultan, with protective guarantees for non-Muslim minorities.                                                          Roberts, Salisbury, pp. 196-201

When the Congress ended, Balkan boundaries were pretty much as Salisbury had outlined in his negotiations that preceded the Congress.

Bio-Optic Organized Knowledge Device

If you have not already heard about the Bio-Optic Organized Knowledge Device (a.k.a. “BOOK”), perhaps you would be amused to look at one of the ~12,000 Web pages describing how it works.

Miscellany from Physics Today

The MODIS imager aboard NASA's Terra orbiter views every square kilometer of Earth's surface every two days.  Using MODIS infrared data, scientists at the Hawaii Institute of Geophysics and Planetology have produced MODVOLC, a continuously updated interactive map of lava fields, volcanic eruptions, and other global hot spots.

George Washington University’s National Security Archive makes available declassified US documents obtained through the Freedom of Information Act.  One of the archive's projects, "China and the Bomb," covers the history of the Chinese nuclear weapons program and U.S. policy regarding it.

Lawrence Berkeley National Laboratory has developed the Home Energy Saver—an online calculator that estimates how much energy your household could save through improvements in efficiency.  

Popular ACM On-line Courses

A recent ACM tip sheet identifies the following as the most popular of its on-line courses in the named areas.[27]

Programming/Applications/Web Development: OOAD: Introduction to Object-Oriented Concepts

Databases: OOAD: Unified Modeling Language (UML) 2.0

Systems/Networking/Security/Web Services: UNIX Shell Programming Part 1: Bourne Shell (Bash)

Graphic Design: Macromedia Dreamweaver MX 2004 - Learning Dreamweaver Basics

Desktop Literacy: Adobe GoLive 6 Fundamentals

Business Skills: Managing IT Projects: Project Initiation and Fundamentals

Home Computing

Hidden Features of Google's Gmail Service

A Web page teaches hidden Gmail features that make this e-mail client especially appealing.

1)   Labelling

2)   Linked conversations

3)   Searching with regular expressions

4)   Advanced search

5)   Import into search clients

6)   Rich formatting for outgoing messages

7)   SSL security

8)   Viewing attachments

9)   Spell check

10)Forwarding and using POP access

A related tabulation teaches little-known Google features, such as Google's Desktop optimization, instant Gmail message notifications, and how to browse Froogle's directory of online products.

Internet Television Station Directory

Intervision provides a directory of broadband Internet TV stations that are free to view.

Ingenious

A museum consortium, the British National Museum of Science & Industry (NMSI), connects information in novel ways to create insights into science and culture.  It draws on over 30,000 images to illustrate about 30 topics and debates.

Preserving Your Aging Color Prints and Slides

BusinessWeek, Out of the Attic and into the Hard Drive,[28] suggests scanners and procedures for digitizing home photographs before their colors fade irretrievably.  Creating digital versions is likely to bring neglected collections into more frequent use.  (The last time I used my slide projector is more than 10 years ago.) 

For a home hobbyist, the scanners are somewhat expensive (approx. $600 for one optimal for 35 mm. slides) for what is likely to be a one-time project.  Wouldn’t it be great if such a devices were available attached to publicly available computers?  If you buy one, perhaps the thing to do when your conversion and those of your family friends are complete is to donate it to your public library.

Handy Utility Programs 

The following suggestions are PC Magazine selections[29] that appeal to me, both for their functionality and their low prices (sometimes free, at the cost of seeing some advertisements). 

Semi-automatic Completion of Web Forms

ROBOFORM fills Web forms and manages your many passwords.  It memorizes each username and password the first time you log into a site, then automati­cally supplies them when you return.  All you have to remem­ber is one master password to decrypt your data.  I find the free version convenient for home computer use.

Traveler’s Access to Home Computer

With LOGMEIN FREE you can securely log in from any other Internet-connected computer to run pro­grams and access data.  You'll have to upgrade to the Pro ver­sion to get file transfer and synchronization, but LogMeln Free lets you copy/paste between the local and remote systems or remotely control your e-mail to send files.

Inexpensive PDF Creation

With an inexpensive tool such as PDF995 turning a document into stable, noneditable PDF format is as easy as printing it.  If even $9.95 seems too steep, you can pay by viewing an advertisement each time you use PDF995.

Network Vulnerability Profiling

Gibson Research's ShieldsUp!! can scan your public IP address for common ports or for all ports. Excellent tutori­als on the site help you understand the results.  LeakTest is a simple program that attempts to connect to GRC's Web server.  It's intended to test whether your firewall will block unknown outbound connections that could "leak" data out of your net­work. It's free to use and worth trying out.

Though GFI's Languard Network Security Scanner is primarily intended for corporate networks, running it on your home network could prove eye-opening.  It will scan your entire local network for hundreds of known vulnerabilities and pro­duce a comprehensive report for each computer or device it dis­covers.  It shows missing patches, open ports, and any security vulnerabilities it finds.

TCPview from Sysinternals, shows all the network communications on your computer.

Who owns that domain?  How is my traffic routed across the Internet?  Answer such questions with DNSstuff .com, together with other utilities that let you look up domains and trace routes.

PC Backup

From a mind-boggling array of software for network backup, PC Magazine recommends Cobian Backup.  It makes security copies of the files and folders you select, as often as you schedule them.  It doesn't use proprietary file formats and won't compress files unless you tell it to.  But it can compress using standard ZIP algorithms and encrypt backed-up files.  It will do full, incremental, or differential backups.

When designing network backup, you must decide where you want the backup software to run—on the individual PCs or on a backup server.  Cobian can see any network folder that has file- and print-sharing turned on.

Price Watch

Flat panel display

Envision 17”

$195

each

Flat panel display

No brand named, 17”

$160

each

Flat panel display

No brand named, 19”

$218

each

Color laser printer

Samsung CLP-510, 1200 DPI, 64Mb, 25ppm B/W, 6ppm color

$275

each

Color laser printer

Minolta 2400W, 400 DPI, 200ppm B/W, 5ppm color

$195

each

HDD internal

Hitachi 250Gb

$55

$0.22/Gbyte

HDD External

Seagate 250Gb, USB 2.0, with backup software

$152

$0.61/Gbyte

NAS HDD

Maxtor 600Gb, Raid 0, USB 2.0 and Firewire, w/backup SW

$470

$0.78/Gbyte

NAS HDD

Anthology 1Tb, Raid, USB 2.0 and Firewire, w/bbackup SW

$700

$0.70/Gbyte

Practical effects of the price reductions reported in 4 years of DDQ Price Watch reports and projected for commodity HDDs are illustrated by the following example.  A 250Gb HDD and a portable USB-attachment enclosure, together with excellent backup software[30] and weighing about 2.5lb., can today be acquired for about $85.  Shipping a 1Tb data collection over the Internet is slow and can be troublesome, so a 2002 “SneakerNet” recommendation suggests that a better approach would be to load the data into a suitably equipped PC, which would then be shipped by parcel post—a 50lb. machine that in 2002 could be purchased for about $1300.[31]  Four years later, we can accomplish the same with 10lb. of hardware costing about $340!

My 5-year-old laptop recently expired.  Its revival would be a poor investment.  While I dithered about choosing a replacement, I traveled with an $85 external drive as just mentioned.  Wherever I went I was permitted to use a PC containing much of the software that I regularly use.  So, at least for the time being, I will not replace the dead laptop.  (Not having computing while traveling is not a problem, but instead provides time for reading.)

 



[1]     The NIST remit includes engaging in R&D of value to the United States manufacturing industries.

[2]     Supplying cultural repositories is not an attractive business opportunity, perhaps because they have not tried to show it to be one.

[3]     For more information: Nationaal Archief of the Netherlands, Remco Verdegem; Koninklijke Bibliotheek, Jeffrey van der Hoeven.

[4]     H.M. Gladney, Principles for Digital Preservation, Comm. ACM 49(2), 111-116, February 2006.

      H.M. Gladney and R.A. Lorie, Trustworthy 100-Year Digital Objects: Durable Encoding for When It's Too Late to Ask, ACM Trans. Office Info. Sys. 23(3), 299-324, July 2005.

[5]     Ford 2006, VXA: A Virtual Architecture for Durable Compressed Archives, http://arxiv.org/abs/cs/0603073 (viewed 5-Apr-06).

[6]     Raymond A. Lorie and Raymond J. van Diessen. A Universal Virtual Computer for long-term preservation of digital information. IBM Research Report, RJ 10338, February 2005.

[7]     The figure is from H.M. Gladney and R.A. Lorie, Trustworthy 100-Year Digital Objects: Durable Encoding for When It's Too Late to Ask, ACM Trans. Info. Sys. 23(3), 299-324, 2005. 

[8]