|
Digital Document Quarterly Perspectives on Trustworthy
Information |
Volume 5, Number 3, 3Q2006 |
|
|
|
|
|||
|
|
HMG
Consulting |
©
2006, H.M. Gladney ISSN: 1547-8610 |
The European Union has funded a new initiative, Preservation and Long-term Access through Networked Services (PLANETS). The 4-year project started in June 2006 and has 16 partners.
The
It is almost trite to mention that achieving pervasive software and data interoperability is among the greatest challenges to truly convenient and effective digital archiving. Within this topic, the slowest progress is likely to be achieving widely pervasive standards for metadata, because this topic is not only highly technical, with differently emphasized objectives for different users and diverse professional communities, but also because obtaining the consensus needed for widespread adoption of an international standard is a slow process. Sometimes a large institution can achieve it, and perhaps influence other communities. Beth Goldsmith and Frances Knudsen describe choices made for the large digital collection maintained by the Los Alamos National Laboratory.
Among international initiatives focused on preserving digitally-represented information, the U.S. National Digital Information Infrastructure Preservation Program (NDIIPP)[1] is the largest concerted effort. Recent articles[2],[3],[4] inform readers about NDIIPP history, remind us that the funded program has reached its mid-point, and suggest that it is time for evaluation so that potential mid-course changes can be considered. DDQ 5(3) is shorter than most DDQ numbers because what might have appeared here is more suitable for, and has recently been submitted to, D-Lib Magazine, where it is still under acceptance review. With the title Digital Preservation in a National Context: Questions and Views of an NDIIPP Outsider, its abstract reads:
A solution is known in principle for every difficult technical problem of digital preservation, including all those identified in NDIIPP publications. Other authors correctly assert that non-technical preservation challenges are greater than technical ones, but do not pay enough attention to applying technology to reduce non-technical obstacles. The current article focuses on technical preservation measures and their potential to mitigate non-technical challenges identified in NDIIPP plans.
Thinking about what document users will want leads us to focus on information contributors and readers instead of repository employees, on document representations and interchange protocols instead of repository design, and on institutional autonomy instead of collaboration. A consequence of good technical decisions is that some apparent preservation challenges are seen not to be research challenges after all. Others apparently belong to initiatives for which NDIIPP funding would be inappropriate.
This article draws attention to missed technical opportunities which, if pursued, would significantly accelerate NDIIPP progress towards the objectives called for by the U.S. Congress. It also identifies concerns about apparent content scope limitations of the NDIIPP plan.
For a broad update about international digital preservation activities, see the Joint US-UK Digital Preservation Workshop report published in August.
Terrorist events have stimulated inquiries into Internet vulnerabilities to massive disruption. In the September number of Scientific American, Tom Leighton suggests that the Internet’s deepest exposures relate to the engines implementing its basic protocols, especially its addressing using the Domain Name System.[5] It begins with:
Even casually savvy computer users these days know to beware of security threats such as viruses, worms, Trojan horses and other malicious bits of code. What few in the public realize, however, is that the Internet is vulnerable to much deeper levels of fraud that exploit fundamental security gaps in the network protocols themselves. These attacks represent a growing menace to personal, corporate and national security that the federal government needs to address urgently.
Consider the defenselessness of the domain name system (DNS), the Internet's version of 411 information. When you type a "www."-style name into your browser software, the browser converts it to an IP address, a string of digits that is the equivalent of a phone number. It does so by contacting a local name server, typically operated by your Internet service provider. Unlike telephone numbers, however, which may be valid indefinitely, IP addresses are valid only for a few seconds, hours or days.
Instead of storing copies of data, Cleversafe Information Dispersal uses a cryptographic key to partition each at-risk file into individually unintelligible pieces which are stored separately. For reliability, algorithms creating these data slices ensure that the file can be reassembled with any majority of the slices. The software is available under the GPL Open Source license.
It will eventually be possible for almost anybody to maintain his/her own sophisticated personal digital library and Web service. Efforts and tools targeting this possibility are described in the following works:
Ø Greg
Sennema’s Creating an Internal Content Management
System.
Ø Gordon Bell’s CyberAll project to encode, store, and easily retrieve all of one person's information use, which was described in 2001 in the Communications of the ACM.[6]
Ø Emerging tools for creating personal Websites, such as NetVibes for creating a web-based home page which you can access from anywhere.
In 2003, DDQ asserted a “simple solution” to the Russell paradox (“Consider the set that contains all sets that do not contain themselves. Is this set a member of itself or not?”), viz.,
A premise of language is that every element―every sign―should be a symbol that denotes something can be pointed out or illustrated. “The set of all sets that do not contain themselves" does not satisfy this premise.
Recently a colleague pointed out[7] that this assertion was both incorrect and ungrounded. There is no theory of language that includes such a premise.
How should I interpret the fact that it took over 3 years for someone to advise me of the error? Perhaps it is that the few people who noticed it decided it not worth anybody’s time to correct the mistake.
A replacement “simple solution” (illustrated by the M.C. Escher drawing excerpts below) follows in colloquial form:
That one can depict or talk about something is not evidence that it exists.
and formally as:[8]
The source of the paradox is the axiom of (some versions of) set theory which asserts that for every proposition P(x), there exists a set of x's for which P is true.

Every information revolution year seems to need its “hot” topic, perhaps because marketing people need something to hype and/or the trade press needs an easily comprehended “innovation”. It seems that this year the topic is “virtualization”, as implemented by hypervisors.[9] Even Microsoft is getting into the act. Of course, the idea is not new. It was used by IBM at least as early as 1980 and will reappear whenever a significant number of users want applications that run only in different computing environments and individual hardware systems are powerful enough to run these concurrently.
IBM's new XML-powered database server is
likely to change the face of database storage. Read Inside IBM DB2 Viper and related product descriptions.
In an August 6 article headed Deal Maker Details the Art of Greasing the Palm, the New York Times detailed what amounts to bribery to obtain government contracts that included work on digital document storage. This scandal provides a peephole into practices that in the current fiscal year divert $64B of taxpayer’s money into Congressmen’s “pork barrel” projects, now euphemistically alluded to as “earmarked appropriations”. Excerpts from the NYT article, which is freely available, include:
The Cunningham scandal set off alarms about the proliferation of Congressional earmarks — money for pet projects inserted anonymously in spending bills — which critics say pervert public policy, encourage cronyism and waste federal money. The 12,000 earmarks in this year’s spending bills amount to $64 billion.
Offering a rare insider’s view, Mr. Wilkes described the appropriations process as little more than a shakedown. He said that lobbyists close to the committee members unceasingly demanded campaign contributions from entrepreneurs like him. Mr. Wilkes and his associates have given more than $706,000 to federal campaigns since 1997, according to public records, and he said he had brought in more as a fund-raiser. Since 2000, Mr. Wilkes’s principal company has received about $100 million in federal contracts.
Mr. Wilkes described the system bluntly: “Lowery would always say, ‘It is a two-part deal,’ ” he recalled. “ ‘Jerry will make the request. Jerry will carry the vote. Jerry will have plenty of time for this. If you don’t want to make the contributions, chair the fund-raising event, you will get left behind.’ ” ...
The culture of the House Appropriations Defense Subcommittee is one of great power and little scrutiny. Mr. Wilkes said every member appeared to have a personal allowance of millions of dollars to disburse without public disclosure. Lawmakers, though, sometimes boast about money being spent in their districts.
In the spending bill for this fiscal year, each member took credit for an average $27 million in earmarks, with the chairman, Representative C. W. Bill Young, Republican of Florida, claiming about $125 million, according to Taxpayers for Common Sense, a nonpartisan group that tracks earmarks.
The photo-sharing website Flickr has added services with
which users
assign
a location to a photo and search for pictures on a map, an activity called
"geotagging"—potentially changing Web search, travel, and local
news. Other websites already
allowing users to add location information to their pictures and to search
geographically include Zooomr, a photo-sharing site, Mappr,
which maps Flickr photos, and Platial, an online atlas.
Connecting information to maps will have many applications. For instance, a search for designer jeans might include the picture of a local boutique with a sale that day. For travel, location-based search could attach news about roadway construction projects. To tag a picture with a location, a user simply drags the image from a panel to a map location. For a Yahoo map, within about a minute, internal search-engine technology updates the photo and tag database.
Privacy will, however, be a big concern for geotagging.
In an August experiment, pigeons outfitted with cell-phone
and sensor backpacks were released in a demonstration project for monitoring
air pollution. Although this was
part of a
This project did stimulate the ire of People for the Ethical Treatment of Animals, which wrote letters of protest to UC-Irvine and ZeroOne San Jose. But then, in recent years almost any unusual public activity attracts protest.
“By comparing … satellite images taken [weeks apart], interferometric synthetic aperture radar [InSAR] can measure terrestrial displacements as small as a centimeter.” [10]
A Google Book Search web page identifies great novels that some people have tried to ban and books about attempts to ban these books from libraries and school curricula.
Pam Samuelson provides interesting insight into the historical relationship between commercial and open-source software in a history of IBM’s software marketing.[11]
DDQ recommends especially Chapter 9, which sketches the influence of Kant on the development of 20th-century philosophy and the rift between Continental and Analytic philosophers. While Continental philosophy dominates in Western European universities, Anglophone philosophy departments are predominantly Analytical, with the Continental persuasion “banished” to departments of literature. This organization is a result of 1930s German Nazi-ism, because so many of the leading European philosophers and scientists happened to be Jewish.
The book ends with a strong recommendation to read Cassirer.
“[H]is synthetic and conciliatory
approach to both philosophical and political questions make Cassirer a much
less striking and dramatic figure than either Carnap or Heidegger. Those interested in finally beginning a
reconciliation of the analytic and continental traditions, however, can
find no better starting point than the rich treasure of ideas, ambitions, and
analyses stored in his astonishingly comprehensive body of philosophical work.”
Published in 1930, this is “A Memorable History of England, comprising all the parts you can remember, including 103 Good Things, 5 Bad Kings, and 5 Genuine Dates.” Its compulsory preface begins,
“HISTORIES have
previously been written with the object of exalting their authors.
The object of this History is to console the
reader. No other history does
this.
“History is not what
you thought. It is what you can remember.
All other history defeats itself,
“This is the only Memorable History of
England, because all the History that you
can remember is in this book, which
is the result of years of research in golf clubs, gun-rooms,
green-rooms, etc.
“For instance, two out of the four Dates
originally included were eliminated at the
last moment, a research done at the
Eton and
For several years, I have maintained notebooks for topics that interest me―notebooks that record and partially organize ideas, publications, and Websites. One such covers XML and its applications. My Notes about XML, XML Tools and Related Stuff: SGML, XSL, XSLT, ... is now available on-line.
“Top 10” lists are perennially popular as reminders of topics that individually might merit longer discussions than the lists themselves provide. Recent lists that might interest DDQ readers include:
Ø Ten True Things about Technology captures a pundit’s selection of truisms.
Ø
Top
10 Stupid Things that Smart IT Pros Still Do describes
injudicious behavior with accompanying cartoons.
Ø 10 things you should know about building a PC from scratch suggests planning required if you want to build your own PC, which sad experience teaches me is an activity only for dedicated hobbyists.
Ø The 14 best ways to protect your computers suggest relatively inexpensive security measures.
DDQ 5(2) reported a practical automatic scanner suitable for many library books. In July, Newsweek reported a fit companion, the Espresso Book Machine, whose current model can print the text for a 300-page book, with a color paperback cover—and bind it—in just three minutes and for only a penny per page. It will retail for less than $100,000. With suitable software, these machines might be coupled to make physical book copying almost as easy as e-book copying!
After seeming to stagnate for about six months, commodity computing component prices have resumed improving.[12] The best offers I have seen include:
|
Laptop computer |
No brand named, AMD Sempron 3000+,
256Mb, 15” screen, 40Gb HDD, CD-RW/DVD ROM, Win/XP Home |
$490. |
each |
|
Compact PC |
HP Slimline S7500N, AMD Sempron
3300+, 512Mb, 200Gb HDD, Double Layer DVD RW, Win/XP Home |
$425. |
each |
|
Desktop PC |
Compaq Presario SR1900NX, Intel
Celeron 3.2 GHz, 512Mb, 533MHz FSB, 120Gb HDD, CD-RW/DVD-ROM, 17” CRT |
$325. |
each |
|
Flat panel display |
Emprex 17” |
$119. |
each |
|
Flat
panel display |
Samsung
20” |
$196. |
each |
|
Color
laser printer |
Samsung
CLP-510, 1200 DPI, 64Mb, 25ppm B/W, 6ppm color |
$380. |
each |
|
Color
laser printer |
Minolta
2400W, 400 DPI, 20ppm B/W, 5ppm color |
|
each |
|
HDD
portable |
Wolverine
2.5” w/enclosure, 100Gb |
$94. |
$0.94/Gbyte |
|
HDD
external |
Seagate
300Gb, USB 2.0, 7200rpm, with backup software |
$152. |
$0.51/Gbyte |
|
HDD
NAS |
Anthology
1Tb, Raid, USB 2.0 and Firewire, w/bbackup SW |
$600. |
$0.60/Gbyte |
|
HDD
for laptop |
Fujitsu
120Gb, 2.5”, 5400rpm, 8Mb buffer |
$130. |
$1.08/Gbyte |
|
DVD
Writer |
Hi-Val
16x +R/-R, 8x Double Layer +R |
$33. |
each |
|
Digi-cam
memory |
SD
or CF, 2Gb |
$28. |
$14./Gbyte |
The cover and body of the latest PC Magazine[13] focus on “63 hot new products that you just gotta have!” Most of these are entertainment and convenience items needed by nobody―not even technophiles. My general impression is that, if you are tempted and can exercise self-control, waiting for one to two years would be prudent, as many of the offerings belong to the bleeding edge (in contrast to the leading edge) and will soon be replaced by refined and cheaper competitors.
[1] Laura
E. Campbell, Envisioning Future Challenges in
Networked Information, JISC/CNI Conference, July 2006.
[2] Abby Smith, Distributed Preservation in a
National Context, D-Lib Magazine 12(6), June 2006.
[3] Guy Lamolinara, “What if NDIIPP knew what NDIIPP knows?” NDIIPP Website, 2006. Also ARL briefing, October 2005.
[4] Deanna B. Marcum, The Future of Preservation,
[5] Tom
Leighton, The
Net’s Real Security Problem, Scientific American 295(3),
,August 21, 2006.
[6]
[7] Paula
Newman, private communication.
[8] John
Sowa, private communication. The
axioms of Fraenkel-Zermelo set are designed to avoid the difficulty.
[9] A
hypervisor is a computer operating system whose application processes are other
operating systems, such as versions of Linux or Microsoft Windows. See XEN Virtual Machine
Monitor, for instance.
[10] Matthew B.
Pritchard, InSAR, a tool for measuring
Earth’s surface deformation, Physics Today 59(7), 68-9, July 2006.
[11] Pamela Samuelson, IBM’s
Pragmatic Embrace of Open Source, Comm. ACM 49(10), 21-25, October 2006.
[12] Prices
include sales taxes