The Library of Congress >> Especially for Librarians and Archivists >> Standards
HOME >> MARC Development >> Proposals List
DATE: June 15, 2009
REVISED:
NAME: Encoding URIs for controlled values in MARC 21 records
SOURCE: RDA/MARC Working Group
SUMMARY: This paper discusses use of a new subfield $1 (Controlled value URI), defined across the formats and fields, for encoding URI for controlled values in MARC records, with an alternate technique using attributes for MARCXML.
KEYWORDS: Subfield $1 (all formats); URIs; Controlled values; MARCXML
RELATED: 2009-DP01/1
STATUS/COMMENTS:
06/18/2009 - Made available to the MARC 21 community for
discussion.
07/12/2009 - Results of the MARC Advisory Committee discussion - The committee decided that further consideration and discussion would be necessary before this discussion paper could go forward as a proposal. Some participants were not in favor of having proposed subfield $1 derive its meaning from its order of placement, in relation to other subfields, within the field. Some suggested that implementing it as a provisional subfield in MARC would allow the community to judge its value. Others argued that specific cases of the subfield's use and benefits need to be more clearly articulated and explored before such a mechanism is implemented. Another paper will be presented for ALA Midwinter 2010 with the following options: 1) Adding subfield $1 as outlined in this paper 2) Proposing a new field modeled on field 880 (Alternate graphic representation) where URIs are included in appropriate subfields where the value would otherwise be recorded and linked to the field that contains the literal values. An additional paper is needed to discuss whether MARCXML and MARC 2709 need to be kept in alignment.
RDA indicates that different elements may use different types of value strings, such as text, codes, or URI for controlled values. The use of a URI instead of plain text is particularly applicable to situations where the value of the particular element comes from a controlled vocabulary, which could be an authority list or formal thesaurus (e.g. a name from the LC Name Authority File or a topic for an LCSH heading) or any other list of controlled codes or terms (e.g. the MARC Code List for Languages). Although URIs have not been made available for values in the aforementioned controlled vocabularies, work is underway to provide them. LC’s Network Development and MARC Standards Office is developing a registry service for controlled lists and in so doing is establishing URIs both for the list itself and for each value on the list. In the future these will be available at http://id.loc.gov and will include the MARC language codes, MARC country codes, MARC relator codes, MARC geographic area codes, and ISO 639-2 language codes. Other agencies are also developing URI lists. OCLC's terminologies service is one instance. RDA has many vocabularies and a DCMI Task Group is establishing URIs that identify each value or concept in those RDA vocabularies.
RDA also allows for other vocabularies controlled outside of RDA to be used with RDA elements (as specified) and assumes that either URIs or textual values will be recorded. This paper discusses only URIs for controlled vocabularies; URIs for elements are outside its scope.
This issue was discussed during the Midwinter 2009 MARBI meeting in 2009-DP01/1. The MARC Advisory Committee considered whether to use the applicable URI in the appropriate subfield in place of the value or to define a new subfield for the URI. The preference was to define subfield $1 (one) across fields and formats to enable the encoding of a URI that would replace or supplement the textual value.
Subfield $1 could be defined in all variable fields in the MARC 21 formats as “Controlled value URI” with the following description:
“A Uniform Resource Identifier (URI) for a value from a controlled vocabulary. Examples are a URI for a language code that represents a language entity or a URI for an authority or bibliographic record that contains a controlled textual value string (such as the Authority record 1XX). The vocabulary list itself may also be referenced via a URI, since a controlled vocabulary element needs an indication of the authorized list from which a value came. A URI provides standard syntax for locating an object using existing Internet protocols.”
Subfield $1, which links to a vocabulary value, is different from subfield $u (Uniform Resource Identifier), in that the latter is defined in several fields in the MARC formats to link to a bibliographic entity that is the resource described in the record, a related resource, or supplemental information to that recorded in the field (such as a table of contents to 505 or an abstract to 520).
Like subfields $6 and $8, subfield $1 will be added to all variable fields in all formats.
Subfield $1 is most likely to be used in:
Since fixed fields do not have subfields, $1 does not apply to them. In cases where it is necessary to record a URI, an alternate variable field should be used. For example, for codes in 008/20 (Format of music) use field 300$a (Physical Description); for codes in 007/04 (Maps, Globe, Nonprojected graphic) use 340 $e (Medium / Support) or 300 and for 008/35-37 (Language) use 041 (Language Code).
Definition and rules for subfield $1 would be documented in the Control subfields appendix of the formats.
It is important to consider that experimentation is needed to see what systems might do with these URIs and how recording them affects record sharing, display, indexing, and other functions. Since controlled vocabularies that are part of RDA are being assigned URIs, it may be desirable to have a mechanism to encode them either instead of or in addition to the value. However, it is possible to have RDA-compliant records without the use of URIs.
If implementing a mechanism for recording URIs for controlled values in MARC 21, another option is to prefer MARCXML attributes for carrying a URI for controlled values. As an exchange format, MARC 21 can be expressed in “classic” MARC (MARC 2709) syntax or in MARCXML, which allows for a lossless record in terms of its ability to carry full MARC 21 data. XML itself provides a mechanism to identify a URI (“xlink”, see below in section 5.2). It would be possible to revise the MARCXML schema to allow for use of a URI for controlled values, which could be converted to MARC 2709 using the $1 subfield, although some ambiguity might be introduced.
The following gives proposed rules for how to record a URI for a value from a controlled vocabulary using subfield $1 in MARC 2709.
Rule 1: If the URI represents the same entity as is recorded in a specific subfield in the field, it will follow that subfield. In this case both the value and the URI are given. If there is more than one subfield with a controlled vocabulary value, the subfield $1 follows each appropriate subfield.
700 1# $aSmith, Elsie, $d1900-1945, $eillustrator $1http://id.loc.gov/vocabularies/relators/illOr
700 1# $aSmith, Elsie, $d1900-1945, $4ill $1http://id.loc.gov/vocabularies/relators/ill
Rule 2: If the URI replaces the entity that would otherwise be recorded in a specific subfield, the subfield is given with the text “uri” followed by the $1 with URI. In order for a system to recognize that “uri” should not be displayed, it could be enclosed within parentheses as a convention (one that is used in MARC 21 in other places).
700 1# $aSmith, Elsie, $d1900-1945, $e(uri) $1http://id.loc.gov/vocabularies/relators/ill
Rule 3: If the URI represents the basic content of the field, it is encoded first. For communications purposes it is highly recommended that both the value in the specific subfield/field is recorded in addition to the URI in subfield $1.
Note that there are issues that would need to be worked out in cases of using both the URI and the textual forms of the data. There could be cases where the data is changed (e.g. the URI) and the two alternative forms become out of sync.
Note that some of the following examples have hypothetical URIs where final decisions have not yet been made as to what to use. The subfields in bold are ones for which URIs for the values are also in the field.
700 1# $aGalway, James. $1http://lccn.loc.gov/n81042545 $4prf $1 http://id.loc.gov/vocabulary/relators/prf $4cnd $1 http://id.loc.gov/vocabulary/relators/cnd
110 2# $1http://lccn.loc.gov/n86041077 $aUniversity of Texas. $bDept. of Anthropology. $0n86041077 $4spn $1http://id.loc.gov/vocabulary/relators/spn
650 #0 $1http://id.loc.gov/authorities/sh95000541 $aWorld Wide Web. $0sh9500541
583 1# $awill transform digitally $c20031104 $iOCR $zqueued for digitization, Nov. 4, 2003 $2pda $1http://id.loc.gov/vocabulary/sources/pda $5NIC $ http://id.loc.gov/vocabulary/organizations/nic
338 ## $aaudio disc
$1http://RDVocab.info/termList/RDACarrierType/1004 $2rdacarrier
Note: the source code for “RDA Carrier” has not been assigned, so
this is for illustration only.
Or
338 ## $1http://RDVocab.info/termList/RDACarrierType/1004 $a(URI)
041 1# $aeng $1http://id.loc.gov/vocabulary/languages/eng $hger $1http://id.loc.gov/vocabulary/languages/ger $hswe $1http://id.loc.gov/vocabulary/languages/swe
In the following examples, the xlink mechanism, as defined by the World Wide Web Consortium, is used. This attribute would need to be defined in the MARCXML schema if this approach is followed. Alternatively, one or more special attributes could be defined in the MARCXML schema for this purpose, such as “link” or “recordLink” or “vocabularyLink”.
6.1. Should a mechanism be provided now (i.e. for initial RDA implementation) to encode a URI for a value from a controlled vocabulary in MARC 21? Should it be limited to e xpressions using MARCXML syntax or be provided for both MARCXML and classic (2709) MARC 21?
6.2. If yes, in terms of rule 2 above, is there another less ambiguous way to signal that a given URI pertains to a particular subfield in the field rather than to the field as a whole when only the URI is encoded?
6.3. What is the best way to show that a URI pertains to multiple subfields in the field, but not to all of them (e.g. a corporate body and its subordinate unit in $a and $b, but not $e)
6.4. When both the textual form and URI are encoded, how will they be kept in sync in case the data changes?
6.5 What will systems do when they encounter a URI, especially without the equivalent textual data? How will it index the data? What will be displayed to the user? Will the URI be resolved?
6.6. Is there additional information needed to allow for a system to bring back understandable information to the user? Should there be an explicit indication as to what is being linked to?
HOME >> MARC Development >> Proposals List
The Library of Congress >> Especially for Librarians and Archivists >> Standards ( 12/21/2010 ) |
Legal | External Link Disclaimer | Contact Us |