|
Digital Document
Quarterly Perspectives
on Trustworthy Information |
Volume
4, Number 1, 1Q2005 |
|
|
|
| |||
|
|
HMG Consulting |
© 2005, H.M.
Gladney ISSN: 1547-8610 |
“In less than a
decade, Internet search engines have completely changed how people gather
information. No longer must we run to a library to look up something;
rather we can pull up relevant documents with just a few clicks … [O]nline search engines are poised for a
series of upgrades that promise to further enhance how we find what we
need.” Scientific
American,
February 2005[1]
Today’s most effective information discovery infrastructure is Internet catalogs and search engines—no longer the catalogs of research libraries. Of course, research library catalogs are part of this.
ACM SIGIR[2] has tracked the vast search literature for about four decades. Three IEEE journals provide online numbers on recent technical developments.[3] IEEE MultiMedia looks at the growing amount of visual information available electronically, and asks, "Is It Time for a Moratorium on Metadata?" IEEE Intelligent Systems examines searching from cell phones. IEEE Distributed Systems Online addresses personalization and asks, "What's Next in Web Search?" The Digicult Thematic Issue 6 treats the topic from the perspective of cultural heritage enthusiasts.
The number of offerings is bewildering, a circumstance likely to continue before simplification sets in. This is driven by the potential for advertising revenue. When I planned the current DDQ number in September, there was an upsurge of offerings. Journalists noticed the trend, so now it is old news. What follows is therefore a synopsis—a readers’ digest—organized to help them make sense of the frenzy of news releases. It also identifies tools that I find particularly helpful.[4]
More Precise Search Results (Context Sensitivity): Tools similar to Web-search tools are appearing for individual consumers’ local collections—PC files and electronic mail—and to limit search to Internet subsets, e.g., Google Scholar. Their usefulness is enhanced by quality ratings for periodicals.[5] Vendors also offer extensions to enterprise-confidential files, databases, and communications.
“
Tools are being tested for filtering and prioritization according to personal interest profiles and prior search history, and also for dynamic generation of search term refinements.[7]
Improved aggregation services and access to standard reference sources (dictionaries, thesauri, encyclopediae, …) will appeal to many users. A current favorite is Refdesk.com. Of interest also is the Web Reference Shelf, which is part of The Extreme Searcher's Internet Handbook.
User Interface Convenience and Information Visualization: The simplest search results are classes (sets of object identifications), as illustrated by Google and U.C. Melvyl result deliveries, and quite imaginatively by NewsMap, which frequently updates its current news feed. The next simplest results are sets of pairs—binary relations. The most complex that see frequently used are sets of triples—ternary relations. These can be depicted as graphs;[8] see KartOO and TouchGraph.
The Pluck browser add-in illustrates services that monitor search results dynamically, keeping users up-to-date when new results appear for a prior query
Grokker E.D.U. is for students access to special libraries and proprietary databases. It selects from several search result sets collected by other search tools that can include MSN Search, Yahoo and Google, categorizing search results and delivers them in visual maps that show their relationships.
Geographic Searching and Sensitivity:
have been discussed in the daily press, e.g., a
review of MSN
Search,
praising the Google
Maps
website, which can be coupled with a GPS
device.
Amazon has added street-level photographs to its business
directory.
Excellent aerial photographs and U.S. Geographical Survey topographic maps are available on Terraserver, a prototype the Microsoft Bay Area Research Center is using for developing database technology. Nothing approaching the level of coverage or detail seems to be available for any other region of the world, according to U.C. Berkeley Earth Sciences and Map Library.
Of
the street guide and route planning services that cover Europe, I particularly
like Map24,
which also covers the
Favorite tools: When I search for scholarly work, I
first exhaust what is easily found with Google,
then use the
For your local environment, consider Yahoo Desktop for full-text search in files and e-mail, and Google Picasa to browse and organize your digital photographs and other images.
To help me know where I am driving and find where I’m going, I recently acquired the Delorme Earthmate GPS™ receiver and its coupled Street Atlas USA™, on sale for a mere $75!
Research and Future
Prospects[9]
“Currently, all search engines fail to capture the bulk of the “invisible Web''—resources locked up in databases and inaccessible by the engines' indexing crawlers. These include regulatory filings at the U.S. Securities and Exchange Commission, detailed reports on charities at GuideStar and complete archives of most newspapers.” New York Times, 26th March
The Bielefeld Academic Search Engine
(BASE, at
In view of the current business and scholarly interest in information discovery, and the immense literature that has not been systematically exploited, we expect many more practical enhancements.[10] These will include combining the best features of current separate offerings. Research groups are also investigating adding semantics to search engines that currently use only document keys and syntactic features.
Individual researchers (or, more realistically, small interest groups) will find it easy and affordable to construct search databases better suited to their particular interests than those that libraries provide. Automatic means could keep such databases up-to-date. This suggests possible restructuring of how and where information functionality is laid out in the Internet. In a decade or two, libraries might be neither the most used repositories nor the preferred search providers. They will still have a critical role in scholarly activities and in preservation of the cultural record, but it well could be different than it is today.
“[T]he number and variety of resources on the
World Wide Web has made … resource description … central to discussions about
the efficiency and evolution of this medium. The inappropriateness of traditional
schemas of resource description for web resources has encouraged …
web-compatible schemas named "metadata".
While conceptually old for library and information professionals,
metadata [will take a] more significant and paramount role than ever before …”
[11]
However, the work to define adequate metadata schema that can be used within the time and effort that writers, publishers, and libraries are willing to invest, and the many published recommendations and debates about various schema, have not been matched by practical uptake.[12] Bulterman asks, “Is It Time for a Moratorium on Metadata?” [13]
This proposal specifies an application programming interface for content repositories in Java 2.
“JSR 170 works on two levels. Level 1 … governs access … at the content element level …
“With comprehensive repository functionality, Level 2 … permits complex applications to exchange data … and provides definitions for future, mature repository developments, emphasizing:
·
Read/write access: … for bi-directional
interaction of content elements.
Procedure is not only checked at the document level, but also at the
“properties” level, …
·
Versioning: … transparent version control within
the whole content repository, [with] … easy access to various versions … [and]
also problem-free modification of versions.
·
Full text search and filtering: [targeting] the
entire non-binary content of a repository … [with] search … that controls the
specific or sub-string search method
respectively.
·
Object classes: [with] limitations … within which
an applications developer can concentrate on specific content object types
…
“Standardization of the methods for handling binary and text-based, as well as structured, semi-structured and unstructured data, is being examined, in addition to event monitoring, namespaces and standard properties, linking, locking and concurrency.” [14]
An
IETF RFC (Internet Engineering Task Force Request for Comments) proposes
simplification of resource identifiers.[15]
Its authors intend it only for information assets. However many resource management
applications, including library, archive, and museum catalogs, need to include
descriptions of material and property assets. Happily the details of the proposal work
for all kinds of asset.
XOP (XML-binary Optimized Packaging) specifies efficient serializing of XML Infosets. A XOP package places a serialization inside an extensible packaging format (such a MIME Multipart/Related). Selected content portions that are base64-encoded binary data can be extracted and re-encoded (i.e., the data is decoded from base64) and placed into the package.
XFDU is a draft specification, from the originators of OAIS, is for encoding and encapsulation of metadata and content for the AIPs, SIPs, and DIPs that OAIS calls for.[16]
These proposals are too new for DDQ comment, even on their relationship (compatible? conflicting?), except to say that XFDU seems compatible with the preservation document structure proposed by Evidence After Every Witness is Dead.[17]
W3C (the World Wide Web Consortium) is closer to adopting a multi-vendor standard for XQuery. Jerry King, general manager for DataDirect's XML products, predicts:
· Moving Beyond SQL: SQL pre-dates many software development cornerstones, making applications difficult to implement using current technologies. XQuery will help with XML content management applications, XML reporting, native XML programming, data integration and Web message processing.
· Access Relational Databases as XML: That XQuery can use XML views to query relational databases the same way that it queries XML will greatly ease developers’ jobs.
· Access Non-Relational Data as XML: Because most data formats can easily be translated to XML, Xquery will become popular for data integration.
· Access Distributed Data Sources: Because XQuery provides built-in facilities for loading and querying data sources anywhere on the Internet. XQuery will be used to join, integrate, share and manipulate data on the Internet as though it was on the local file system.
· Standards-Based Programmatic Data
Access: The XQuery API for Java (XQJ), the XML equivalent to JDBC or
King also says that skills and tools for XSLT and XML Schema will be in much demand.
“Experts are both in awe and in frustration about the state of the internet. They celebrate search technology, peer-to-peer networks, and blogs; they bemoan institutions that have been slow to change. … The experts are startled that educational institutions have changed so little, …” Fox et al.[18]
"… digital is not generally viewed as a suitable long-term preservation archival surrogate for print. It is currently regarded more as an access medium. As a preservation medium, [it was seen] as unstable, experimental, immature, unproven on a mass scale and unreliable in the long-term." [19]
The second quotation needs careful attention to its context. Whose perspective is represented? What questions were the speakers asked to address? It is from a poll of the directors of 16 major libraries—mostly people with a liberal arts background,[20] apparently without any technical experts. They were asked only about digital surrogates for content already held in older formats (on paper and other media), and only about current practice, not about how means and controlling social conventions (including legal constraints) might evolve in either the near or the distant future.
I am
reminded of an intellectual property attorney who reminded an IBM research staff
audience, "You need to be careful which question you ask an attorney. You
might ask either, 'What problems might I encounter if I do X?' or 'If I choose
to do X, how should I proceed to stay out of
trouble?'
"Well, we attorneys are professionals, and as professionals will answer the specific question you ask. The answer to the second is likely to be very different than that the first, and much more useful."
It seems to me that the literature from research librarians and information scientists predominantly treats digitally-represented information as a problem, rather than as an immense opportunity.[21] I wonder whether this impression is reasonable and, if so, why their views are pessimistic. I would appreciate views from the digital heritage community.
The trusted personal computer hardware platform—running a secure environment rather than software-only solutions—is emerging as a powerful new tool to improve enterprise data protection and user authentication. Industry offers many PCs and motherboards equipped with a Trusted Computing Module, a dedicated microchip enabled for security-specific capabilities. Specifications have been developed and promoted by an industry standards organization called the Trusted Computing Group.
In contrast to the criticism that appears in Trust, Trusted, Trustworthy in DDQ 1(2), the word ‘trusted’ in the paragraph above is not misleading. The critical distinction is that, in this case, the trusting entity is known; it is an operating system that depends on information from the Trusted Computing Module.
While no one owns the Internet, it cannot function without ICANN (Internet Corporation for Assigned Names and Numbers)—the not-for-profit corporation that manages the Internet addressing system. For several years ICANN has been attacked by international organizations that say the United States holds too much control over the Internet’s core functions. ICANN CEO Paul Twomey has explained how his organization has become a lightning rod for criticism and why he thinks it is undeserved.
Since the beginning of the year, British citizens could request information at any time and expect an answer unless an exemption applies.[22] The 30 year rule has disappeared. Over 50,000 files less than 30 years old have been released by The National Archives.
Stylistic examples for prospective authors!
... from secondary school
essays:
1.
His thoughts tumbled in his head, making and breaking alliances like underpants
in a dryer without Cling Free.
2.
He spoke with the wisdom that can only come from experience, like a guy who went
blind because he looked at a solar eclipse without one of those boxes with a
pinhole in it and now goes around the country speaking at high schools about the
dangers of looking at a solar eclipse without one of those boxes with a pinhole
in it.
3.
She grew on him like she was a colony of E. coli and he was room-temperature
Canadian beef.
4.
Her vocabulary was as bad as, like, whatever.
5.
He was as tall as a six-foot-three-inch tree.
6.
The revelation that his marriage of 30 years had disintegrated because of his
wife's infidelity came as a rude shock, like a surcharge at a formerly
surcharge-free ATM.
7.
The little boat gently drifted across the pond exactly the way a bowling ball
wouldn't.
8.
McBride fell 12 stories, hitting the pavement like a Hefty bag filled with
vegetable soup.
9.
The scene had an eerie, surreal quality, like when you're on vacation in another
city and Jeopardy comes on television at 7:00 p.m. instead of
7:30.
10.
The hailstones leaped from the pavement, just like maggots when you fry them in
hot grease.
11.
John and Mary had never met. They
were like two hummingbirds who had also never
met.
12.
He fell for her like his heart was a mob informant and she was the
13.
Even in his last years, Grandpappy had a mind like a steel trap, only one that
had been left out so long, it had rusted shut.
14. Shots rang out, as shots are wont to do.
The electronic proceedings from the Virtual Reference Desk 2004 Conference are available online, as are all the conference papers at the Electronic Publishing conferences from 1997 to 2044.
The Pew Foundation has made available a study on the future of the Internet, briefly profiled in the New York Times on 11th January.
To many people with a European or North American cultural tradition, Muslim political behavior must be puzzling, since many of its manifestations seem contrary to the best interests of their perpetrators and countrymen. Books and film suggest that distrust and hatred are important beyond Western experience.
A Lawrence of Arabia scene shows an Arab League meeting quickly degenerating from co-operation to violent tribal jealousies that weakened all the participants in their dealings with the English and French. This allowed the latter to establish vassal state governments that Arabs have ever since hated.[23]
Chapter 24 of Landes’ Wealth and Poverty of Nations[24] begins:
“No-one can
understand the economic performance of Muslim nations without attending to
the experience of Islam as faith and culture. … By the time Europeans entered the
Indian Ocean by sea (1498), Islam had planted itself in parts of
“This
explosion of passion and commitment was the most important feature of Eurasian
history in … the thousand years from the fall of the western Roman empire … to
the overseas expansion of Christian Europe. In this sense, it anticipates the
potency of the later European imperial sweep, …
“The critical
difference between the two rushes of power is the place of technology. The Muslim rested on old ways but new
men, on the fighting zeal of fast-moving, horse-mounted warriors who were
convinced that God and history were on their side. …
The European push was based on superior firepower and moved by profit:
loot yes, but above all, continuing, sustainable
profit.”
Landes’ ideas are elaborated in Bernard Lewis’ more focused and shorter historical account, What Went Wrong? Western Impact and Middle Eastern Response, a book that I believe should be on everybody’s short list of social history.[25]
A partial explanation is suggested by early chapters of Leon Uris’ novel, The Haj.[26] If its description of a boy’s education by his father accurately depicts common behavior, from a very young age Muslim youth are trained to distrust and loot from anyone—even family and tribe members. Similar suggestions were the core of a recent editorial,[27] which included:
“Americans are still puzzled over why
well-off Islamic fundamentalists crashed planes into skyscrapers and now send
mercenaries to the Sunni triangle to slaughter us as we sponsor democracy. Yet since Sept. 11, 2001, we have
grasped that Muslim fascists understood that the course of American-led world
history—democracy and globalized capitalism—was leaving them behind. Thus they strike the
“…
“The
I am reminded of a Russian tale in which only one villager owned a cow. A genie appeared to grant a wish to another villager, but was surprised at the choice made: “Kill that guy’s cow!”
“Crime is now organized on the Internet. Operating in the anonymity of cyberspace, the Shadowcrew and Web mobs like it threaten the trust companies have spent years trying to build with customers, online. Here's how one cybercrime network uses administrators, vendors and forums to traffick in millions of credit card accounts and Social Security numbers.” John McCormick and Deborah Gage[28]
To learn how one identity-theft business worked, read McCormick and Gage’s account.
You might find the TCP/IP and TCPDUMP Pocket Reference helpful.
TechRepublic points out that, for small businesses and home offices, Linksys™ routers are popular targets for hackers. It recommends a 10 step procedure to secure such a router.
Lockergnome reviews this package for screen shot presentations, graphic artwork, photos, and Web comics favorably. PC users who have not already invested in screen capture and graphics software should consider this $30 package.
http://www.word-answers.com/ helps with Microsoft Word. It points to over 900 articles in over 100 topic areas, addressing MS Word versions 6 through 2003.
WinAudit reports a PC’s hardware and software configuration, complementing BelArc Advisor. It details installed software, licenses, peripherals, memory usage, processor model, network settings, etc.
WinAudit is free works with all Windows versions since Windows 95. It requires no installation, and fits easily onto a floppy disk, enabling quick computer inspections with minimal effort.
With Firefox open, enter about:config in the URL box;
then, enter network.http in the browser's filter function. In the
line identified as network.http.pipelining change the setting of "false"
to "true" by double clicking on the line. In the line identified as
network.http.proxy.pipelining do the same thing to change the setting
from "false" to "true". In the line identified as
network.http.pipelining.maxrequests, doubleclick on the line twice and a
window will open. Change the value to 20.
These changes enable Firefox to use network connections more
efficiently and should somewhat speed up Web page retrievals.
Technology Review senior editor Wade Roush purchased a new PC that he knew wouldn't be a fancy machine. But it cost only $278. He chose it because it was without any Microsoft software whatsoever. Instead, it came with Linspire 4.5, a commercial open-source Linux version. Plugged in, the machine revealed a glamorous new desktop screen and sophisticated help menus and audio tutorials. Software giving Linux the look, feel, and functions of a Windows PC is increasingly available both in free, unsupported versions and in enhanced commercial versions.[30]
|
USB 2.0 Memory
Key |
128
Mbyte |
$20. |
$160/Gbyte |
|
PC
Memory |
512Mb PC3200
DDR |
$35 |
$70/Gbyte |
|
Digital camera
storage |
1Gb compact flash
card |
$50 |
$50/Gbyte |
|
Mobile
drive |
USB connect
2.2Gb |
$80 |
$36/Gbyte |
|
Serial-ATA
HDD |
120Gb
internal |
$50 |
$0.42/Gbyte |
|
DVD-R
disks |
8x |
$0.07 |
each |
|
DVD-R
disks |
4x |
$0.05 |
each |
|
DVD-ROM
drive |
16x |
$20 |
each |
|
DVD
writer |
8x Dual ±R / ±RW
|
$50 |
each |
|
Wireless
Router |
Airlink 54Mbps
Wireless-G cable/DSL wireless router |
$19 |
each |
|
PC Wireless
Adapter |
Airlink 54Mbps
Wireless-G laptop PC or PCI adapter |
$15 |
each |
|
Flat panel LCD
display |
17” .264 mm pitch,
450:1 contrast ratio |
$200 |
each |
Critique by and discussions with John Bennett, Tom Gladney, and John Swinden have helped create this DDQ number. Their help is gratefully acknowledged.
[1]
Mostafa, Javed. Seeking
Better Web Searches, Scientific American 292(2), 67-73,
2005.
[2]
ACM Special Interest Group for Information
Retrieval
[3]
From What's New @ IEEE In Computing 5(12), December
2004.
[4]
However, I do expect my preferred tools to be a different set a year from
now.
[5]
For instance, see Mylonopoulos, N.A. and
Theoharakis, V. Global perceptions of IS
journals,, Comm. ACM 44(9) 29-33, Sept. 2001. Ruth
Bolotin Schwartz and Michele C. Russo, How to Quickly Find Articles in the
Top IS Journals, Comm. ACM 47(2) 98-101, 2004. See also http://63.151.43.10/csaunders/rankings.htm.
[6]
Wang, Roland.
[7]
For instance, see the Google Labs service test at http://labs.google.com/personalized/.
[8] A recent Museums and the Web conference paper
introduces graphical presentation tools.
See Addis et al.,
[9]
Just as I started the final preparation of DDQ 4(1) for release, the ACM
News Service announced availability of David Southgates’, Powerful Query
Technology Will Optimize Knowledge Management for Project
Managers,
TechRepublic, March 2005.
[10] For a
longer discussion, see Asadi, S. Jamali M.H.R. Shifts in search engine development: A
review of past, present and future trends in research on search
engines, Webology,
1(2),
Article 6, 2004.
[11] Safari,
Mehdi. Metadata and the Web,
Webology 1(2), December 2004.
[12] Greenberg, J. Spurgin, K. Crystal, A. Final Report for the AMeGA (Automatic Metadata Generation Applications) Project, submitted to the Library of Congress, February 2005.
[13] Bulterman, Dick C.A. Is It Time for a
Moratorium on Metadata? IEEE Multimedia 5(12), Dec.
2004.
[14] Cadoff,
Dave. Java
Standard 170, ServerWorld Magazine,
2003.
[15]
Van
de Sompel, Herbert. et al. The
"info" URI Scheme for Information Assets with Identifiers in Public
Namespaces, IETF RFC, January
2005.
[16] CCSDS 650.0-R-2, Reference Model for an Open Archival Information System (OAIS), July 2001.
[17]
Gladney,
H.M. Trustworthy 100-Year Digital Objects: Evidence
After Every Witness is Dead, ACM Trans. Info. Sys. 22(3),
406-436, July 2004. See especially its Figures 2 and
3.
[18]
Fox,
Susannah. Anderson, Janna Quitney. Ranie, Lee. The Future
of the Internet, Pew Internet and American Life Project,
January 2005. See also PC World
commentary.
[19] Anonymous from the British Library, Digital versus
print as a preservation format – expert views from international comparator
libraries, 2005.
[21] See, for
instance, Beebe, Linda. Meyers,
Barbara. The Unsettled State of
Archiving, Journal of Electronic Publishing 4(4), June 1999.
[22] National
Archives, Release
of over 50,000 files to mark the full implementation of the Freedom of
Information Act, January 2005.
[23] See, for
instance, The Sykes-Picot Agreement:
1916.
[24]
Landes,
David S. The wealth and poverty of
nations: why some are so rich and some so poor, W.W. Norton, 1998. ISBN:
0-393-04017-8
Chapter 24, History Gone Wrong.
[25] Lewis,
Bernard. What Went Wrong? Western
Impact and Middle Eastern Response,
[26]
[27] Hanson,
Victor Davis. They Hate Us For Who We
Are, Not What We Do. To
Terrorists,
[28] McCormick,
John. Gage, Deborah Gage. Shadowcrew:
Web Mobs, Baseline Magazine, March 28,
2005.
[29] Adapted
from J. Teems’ Neat Net Tricks,
31st March 2005.
[30] Adapted
from Technology Review. See http://www.technologyreview.com/articles/04/09/roush0904.asp?trk=nl