Ray Denenberg
Library of Congress
ray@rden.loc.gov
October 1996
A brief historical overview of Z39.50 is provided as context for
discussion of Z39.50 recent developments and future prospects.
Although the historical events leading to the development of Z39.50 are
sometimes tracked back to the 1960s, momentum to standardize an information
retrieval protocol began to sharpen in the early 1980s with the beginning of
the Linked Systems Project, LSP, whose implementation began In 1982, and which
became operational in 1985. The participants were the Library of Congress,
RLG, and OCLC.
The essence of LSP was the Authorities application: the establishment
and maintenance of a nationwide database of name authority records. Two
application level protocols were developed: Record Transfer and Information
Retrieval. The primary function of the authorities application was the
transfer of the authority records between systems. supported by the Record
Transfer protocol. A background function was the intersystem searching of
authority records, supported by the Information Retrieval protocol.
Both the Record Transfer and information retrieval protocols were
developed to support authority record exchange, but were intended to support
record exchange and intersystem searching regardless of record type.
In 1983 the LSP participants submitted both protocols, Record Transfer
and Information Retrieval, for consideration as American National Standards.
For Record Transfer, attempts to standardize were eventually abandoned (and
ultimately, the Record Transfer protocol itself was replaced by FTP).
There was however substantial interest within the U.S. in standardizing
an information retrieval protocol, and the LSP Information Retrieval protocol
was submitted to ANSI/NISO, who formed a committee that prepared it for
ballot, in 1984, when it was given the designation "Z39.50", as it is known
today. (NISO was formerly named Z39, and continues to use that designation
for its standards.) The 1984 ballot failed within NISO, for reasons beyond the
scope of this paper (primarily because it was not yet sufficiently well-
developed). There was significant further development over the next three
years; Z39.50 was re-balloted in 1987, this time successfully, and was
approved by ANSI in 1988.
Independently, in 1984, a work item was approved in ISO for a "Search
and Retrieve" protocol, called SR. There were several drafts of the SR
standard between 1984 and 1991 when it was finally approved. As difficult as
it was to achieve consensus on Z39.50 in the U.S., it was more difficult to
achieve international consensus on SR, because of the various conflicting
national interests represented. Of course the U.S. input was influenced by
Z39.50, which was not entirely stable during the period of SR development. The
result was that several incompatibilities remained between SR and the 1988
version of Z39.50.
GILS, the Government Information Locator Service, is a response to the need for users to identify and locate publicly available Federal information resources. The GILS Profile provides the specifications for the overall GILS application, including the GILS "Core" data elements that comprise a GILS record describing an information resource, and the use of Z39.50 to search and retrieve GILS records.
ATS Profile
The "Author-Title-Subject" profile aims to improve the reliability of
Z39.50 search results. When a client requests, for example, an author search,
the intent of the ATS profile is that the server will execute the search based
on its concept of author. If the server does not support an author search, it
should not re-cast the search, substituting some attribute other than author,
without the client's knowledge and consent. Neither should the server treat
the inability to perform a search as a successful search with no results.
The profile specifies the use of bib-1 within a type-1 query for
searching by author, title, or subject, to provide basic search access to
bibliographic databases.
WAIS Profile
The WAIS (Wide Area Information Servers) profile specifies rules for access to WAIS servers supporting Z39.50 version 2.
Collections Profile
In August, 1995, the Library of Congress convened a team of
representatives from several institutions to develop a Z39.50 profile for
access to digital libraries. Participating organizations included Getty,
Berkeley, University of Michigan, University of California, OCLC, LC, RLG,
Chemical Abstracts Service, IBM, FCLA, TRW, Knight Ridder, SilverPlatter, as
well as consultants and liaisons.
The scope was narrowed to apply to navigation of digital collections,
and was named the Z39.50 Profile for Access to Digital Collections
(Collections Profile). The larger problem of access to digital libraries was
left to the province of other profiling efforts, including CIMI and the
Digital Library Object profiles described below. Other groups were
initiating independent efforts to develop profiles aimed at specific types of
objects and collections. The intention was to coordinate these
efforts and that these latter profiles would be developed as compatible
extensions or subsets of the Collections profile.
The profile aims to address the problem faced by libraries and other
institutions who create collections, organized thematically -- by subject,
creator, historical period, etc.-- with numerous, diverse objects, both
digital and physical. These collections are often organized hierarchically and
distributed across servers. Significant resources may be invested in
digitization and in the intellectual efforts of aggregation, organization, and
description of the information in a collection. Yet to a remote user or
client, the collection may appear to be simply an accumulation of objects and
undifferentiated data, because there is no agreed-upon semantics for
navigating the collection, to locate and retrieve objects of interest.
Coherent organizational structures, imposed on the data, are necessary to
provide a view that supports navigation.
A key obstacle to effective navigation is the inability to distinguish
content from description. A primary goal of navigation is to locate and
retrieve objects of interest; a vital step in that process is to locate
relevant descriptive information. Thus it is useful to navigate among
descriptive information as well as content, and consequently, to be able to
distinguish content from description.
The profile exploits organizational structures to allow a client to
navigate through structured information. A coherently defined set of
descriptive data is used to manage and navigate collections of otherwise
undifferentiated data. These organizational structures allow the data to be
viewed as distributed, hierarchical collections.
The objectives of the profile are to:
CIMI Profile
The Consortium for the Computer Interchange of Museum Information (CIMI)
has supported the development of a Z39.50 Profile as part of its current
Project CHIO (Cultural Heritage Information Online), for access to museum
information.
Museum information includes a variety of physical and electronic
objects, including physical artifacts and electronic derivatives, descriptive
records designed for collection management, full-text documents, and online
tools such as thesauri and authoritative lists of artists' names.
A digital collection of museum information needs to address not only the
heterogeneous nature of the information objects but also the fact that such a
collection will draw upon repositories of museum information distributed
around the world.
CIMI initiated Project CHIO as a demonstration project to investigate a
standards-based approach for searching and retrieving cultural heritage
information from disparate and distributed information systems containing
museum information. Project CHIO consists of two interrelated demonstration
projects -- CHIO Structure and CHIO Access -- to show respectively the utility
of SGML and Z39.50, to enhance electronic access to cultural heritage museum
information in a distributed, networked environment.
Museum information includes physical and electronic objects -- physical
artifacts and electronic derivatives of those artifacts, descriptive records
designed for collection management, full-text documents, online tools such as
thesauri and authoritative lists of artists' names, and more.
CIMI initiated Project CHIO as a demonstration project to investigate a
standards-based approach for searching and retrieving cultural heritage
information from distributed information systems containing museum
information. Project CHIO consists of two interrelated demonstration projects
-- CHIO Structure and CHIO Access -- to show respectively the utility of SGML
and Z39.50, to enhance electronic access to cultural heritage museum
information in a distributed, networked environment.
"CHIO Structure" uses SGML to mark up museum objects including (text)
exhibition catalogues and wall text, and make them available for electronic
access. "CHIO Access" demonstrates the utility of Z39.50 to access digitized
museum objects.
Digital Library Objects
The Z39.50 Profile for Access to Digital Library Objects (DL
Profile) addresses functional and user requirements for search and retrieval
of information in digital library collections, specifically the Library of
Congress digital library collections and similar collections.
The profile provides a general and flexible model for the structure of
a digital object. In the model, a digital object may consist of constituent
parts, any of which may in turn consist of constituent parts, and so on.
Consider, for example, a single digital object consisting of several images
(e.g. photos or text images). Although the set of images comprises a single
digital object, each must be distinctly representable and the object must
convey the fact that there are distinct images, how many, and their individual
characteristics. Thus they are represented as separate elements of a Z39.50
record.
Next suppose that the digital object not only includes a number of
images, but also additional constituent parts, further structured; for
example, each such constituent part may consist of several images. This
introduces an intermediate level of aggregation. The model of a digital object
adopted by the DL profile assumes arbitrary levels of aggregation and is
represented as a tree, where each non-leaf node has an arbitrary number of
subtrees and/or leaves, and leaf nodes represent data.
Every node, whether a leaf or non-leaf node, may have metadata attached,
including description, date of creation, terms and conditions, etc.
This model will support, for example, a digital object representing 10
boxes, each with 20 folders, each with 30 photos. Z39.50 string tags such as
'box', 'folder', and 'photo' could be used to convey the type of element. As
a more complex example, a folder might include a variety of photos, maps,
correspondences, etc. and perhaps the correspondences consist of several
sequential digitized pages.
CIP Profile
CIP - the Catalogue Interoperability Protocol - addressed the ability to
effectively exploit earth observation and associated data resources. That
capability is impeded by the lack of homogeneity in services and interfaces
offered by various data providers. CIP is being developed by the Protocol Task
Team within the Committee on Earth Observation Satellites (CEOS).
CEOS provides coordination between international Earth observation
missions and encompasses various national (civil) agencies involved in Earth
Observation satellite programmes: the European Space Agency, NASA, DLR
(Germany), NASDA (Japan), DDRS (Canada), BNSC (UK), and CEO (Centre for Earth
Observation).
The objective of CIP is to enable users to logically search physically
distributed data catalogues, without separately querying each and
merging/correlating result sets, effectively allowing the various data
archives to appear to be a single database. It includes a data dictionary to
specify the common attributes that describe the primary objects within a
catalogue system.
CIP models collections, permitting complex hierarchical groupings of
data organized thematically over multiple databases, where both the
collections and the individual collection members (objects and subcollections)
have item descriptors, roughly analogous to the descriptive records defined by
the Collections profile.
Cataloging Profile
A service named WORLD 1 will be offered by the National Library of
Australia to replace the current Australian Bibliographic Network and Ozline
services. The technical infrastructure to operate the WORLD 1 Service is being
developed as a joint venture by the National Library of Australia and the
National Library of New Zealand under the banner of the National Document and
Information System (NDIS) Project.
The plan is to use union catalogues as tools for the identification of
resources and their location, in a geographic area. The premise is that union
catalogues with good coverage and authority control are still an attractive
concept because of the limitations of multi-target searches, with performance
degradation (for searches over several targets), where results are not well
integrated, with duplicate records, and multiple versions of headings (e.g.
author and subject).
Libraries contributing to a union catalog would require a cataloging
system to update both their own local catalog and the union catalog in a
single operation, and the project proposes to integrate the "cataloguing
protocol" with Z39.50. To this end, they propose to use Z39.50 both for search
and update, and they are profiling the Z39.50 Update Extended Service.
ZSTARTS
The Z39.50 Profile for STARTS (ZSTARTS) stems from the
Stanford Protocol for Internet Search and Retrieval (STARTS), an
initiative of the Stanford Digital Library Project. The STARTS project brought
together a number of commercial companies to develop requirements for
distributed searching and ranked retrieval. The ZSTARTS profile is a Z39.50
solution to these requirements.
The STARTS model assumes document databases; a client sends a query to
multiple servers, where the query includes a filter and ranking expression.
The filter is analogous to the Z39.50 type-1 query (i.e. a boolean query);
while the ranking expression supplies guidance for the server to rank results
-- the client may assign weights to individual terms. The STARTS model calls
for the merging of the ranked results from the various servers.
Search results include document metadata: title, publication date, size,
score (assigned to the document for the given search), occurrence information
(pertaining to the terms in the query) and a pointer (url) to the document for
subsequent retrieval.
Type-102 RLQ
The Type 102 Ranked List Query (RLQ) was originally intended to be developed as a natural language query, but it was deemed impossible to design a query that adequately supports all of the natural language search methodologies. Type 102 RLQ has instead been designed for the ranked searching technologies used by large-scale commercial information providers and information industry software vendors, several of whom have participated in the development of this query, including:
Proposed SQL Query
The Distributed Database Unit at CRC for Distributed Systems Technology Centre at the Department of Computer Science, University of Queensland, is proposing changes to Z39.50 to support SQL databases. Their proposal includes: