The Library of Congress >> Especially for Librarians and Archivists >> Standards

MARC Standards

HOME >> MARC Development >> Proposals List


MARC PROPOSAL NO. 2017-08

DATE: May 16, 2017
REVISED:

NAME: Use of Subfields $0 and $1 to Capture Uniform Resource Identifiers (URIs) in the MARC 21 Formats

SOURCE: PCC Task Group on URIs in MARC

SUMMARY: This proposal outlines a method to capture URIs in the MARC 21 Formats in a manner that clearly differentiates between:

To that end, the paper proposes restricting the use of the $0 to URIs and control numbers that refer to Records describing Things, and defining the $1 to include URIs that directly refer to the Thing.

Note: Standard vocabulary terms from controlled lists, such as MARC lists, are not generally considered Authority ‘records’; however, when those terms are represented as SKOS concepts and assigned actionable/dereferenceable URIs, they do carry with them ‘record-’ like data in a particular vocabulary scheme. The latter are referenced in this paper as Authority ‘records’ in conjunction with more traditional Authorities in a record format.

FIELDS AFFECTED: (Bibliographic format) 033, 100, 110, 111, 130, 240, 337, 338, 340, 344, 345, 346, 347, 370, 518, 600, 610, 611, 630, 647, 648, 650, 651, 654, 655, 656, 657, 662, 752, 753, 754, 800, 810, 811, 830

(Bibliographic and Authority formats) 034, 043, 336, 348, 370, 377, 380, 381, 382, 385, 386, 388, 700, 710, 711, 730, 751, 880, 883, 885

(Authority format only) 260, 360, 368, 372, 373, 374, 376, 500, 510, 511, 530, 548, 550, 551, 555, 562, 580, 581, 582, 585, 672, 673, 682, 747, 748, 750, 755, 762, 780, 781, 782, 785

(Holdings format) 337, 338, 347, 561, 883

(Classification format) 034, 043, 700, 710, 711, 730, 748, 750, 751, 754, 880, 883

(Community Information format) 043, 100, 110, 111, 600, 610, 611, 630, 648, 650, 651, 654, 656, 657, 700, 710, 711, 730, 880, 883

KEYWORDS: Authority record control number or standard number (All formats); Subfield $0 (All formats), Subfield $1 (All formats); Uniform Resource Identifier (All formats); URI (All formats); Real World Object (All formats)

RELATED: 2007‐06/1; 2009-DP01/1; 2009-DP06/1; 2010-DP02; 2010‐06; 2015‐07; 2016‐DP04; 2016-DP18; 2016-DP19; 2017-DP01

STATUS/COMMENTS:
05/16/17 – Made available to the MARC community for discussion.

06/24/17 – Results of MARC Advisory Committee discussion: Approved, with the amendment that subfield $1 will be made repeatable; Field 257 will be added to the list of fields in scope for inclusion of $1.

08/07/17 - Results of MARC Steering Group review - Agreed with the MAC decision.


Proposal No. 2017-08: Use of $0 and $1 to Capture URIs

1. BACKGROUND

In the MARC 21 format, the $u generally designates URIs that serve as web addresses for documents (commonly called a URL or Uniform Resource Locator). The $0 contains an “Authority record control number or standard number” which may be in the form of a URI. To date, these subfields and URI distinctions (document location vs. control/standard number) have sufficed to meet library needs. As the library community moves into a linked data environment, however, new use cases arise that necessitate the refinement of existing subfield definitions and implementations, and/or the introduction of new subfields. Such evolution is exemplified by the recent refinement of $0 to remove the parenthetical prefix ‘(uri)’ in order to more easily facilitate dereferencing of HTTP format URIs [see 2016-DP18].

Experiments by the PCC URI Task Force and others in converting MARC 21 to linked data suggest that there are major benefits to storing URIs in MARC 21. That said, Resource Description Framework (RDF), the recommended encoding for linked data, requires more semantic precision than MARC 21 currently contains. This paper asserts that the use of different MARC 21 subfields for URIs that refer to different types of entities is an important prerequisite for the conversion to linked data, a proposal that is illustrated with a refinement for the definition of $0 and a new definition of $1.

A scope note. The Uniform Resource Locator (or URL) is another important type of URI, which provides addresses for human-readable websites, documents, or web pages. But since the focus of this paper is linked data designed for machine consumption, document URLs are out of scope. URLs and the use of $u (described above) to capture them are not part of the proposal.

2. DISCUSSION

2.1. URIs and the Semantic Web

According to linked data design principles [Cool URIs, https://www.w3.org/TR/cooluris/], the semantic web infrastructure relies on the unique identification of entities—or, in semantic web terms, ‘Real World Objects’ (RWOs); or, even more colloquially, ‘Things.’ For example, a Person and a MARC 21 Authority record about the person are different, and in the RDF representation, each needs to be uniquely identified with distinct URIs for semantic clarity. In Section 2 of this document, the argument for greater machine-understandable semantic clarity will be presented using pseudo-RDF data to ease the burden on the human reader.

RDF statements about a living person may include lifespan dates or a home address, which would be accessible from a URI that functions somewhat like a Social Security number. But an authority record is fundamentally different because it is an information object that may contain a description of a person, as well as a revision history and other facts about the record itself. Although this difference may seem pedantic, it is important for making precise statements about library resources. When we state in RDF that “William Shakespeare is the author of Hamlet,” we want to ensure that a machine understands the reference is to the person who lived from 1564 to 1616, and not to an authority record or similar document. In short, a person can be an author, but a record cannot.

Unfortunately, this distinction is easily lost when URIs are recorded in MARC using current conventions. For example, a common pattern on the semantic web is to say that an Authority (modeled as skos:Concept) has a focus of the Thing (using the property foaf:focus), as in the example below:

<URI for an Authority for Some Person> <foaf:focus> <URI for Some Person> .

Alternatively, a Person can also be linked back to the Authority using a different property (e.g. madsrdf:isIdentifiedByAuthority):

<URI for Some Person> <madsrdf:isIdentifiedByAuthority> <URI for an Authority for Some Person>.

If the distinction between Authorities and RWOs is not made when URIs are added to MARC records, RDF converters will not have enough information to produce the correct relationships between entities. For example, the following MARC field should ideally translate into the RDF statement below it:

100 1# $a Last name, First, $e author. $0 <Some URI>
<Some Resource> <authoredBy> <Some Person> .

However, when we reference the author’s LC/NACO NAF authority record URI in the MARC 21 100 $0 subfield, an automated conversion to RDF might result in a triple stating that the author of the Resource is the authority record rather than the Person, as exemplified below:

100 1# $a Last name, First, $e author. $0 URI for an Authority record about the Person
<Some Resource> <authoredBy> <Authority record about the Person> .

Instantiated with real data, this pattern takes us back to the introductory discussion, asserting that Hamlet was authored by the Authority record about William Shakespeare.

In RDF, if you say that two entities are the same using the common RDF property owl:sameAs, then everything stated about one entity is also true of the other. This can lead to messy data if the two things are not in fact the same. For instance, two authority records from different national authority files describing the same person are not the same resource. Each authority record has unique traits: different dates of creation and/or of modification, different sources of information, different processes asserted on them, etc. Therefore, rather than asserting that the two authority records are owl:sameAs, we want to assert that the focus of each authority record is the same Person, which is identified by the URI for the Person/RWO. URIs that directly identify a Person provide a bridge between different authority records focusing on the same Person.

The following diagram illustrates the semantic differences in RDF between Records about Things and Things/RWOs.

In the above example, it may appear that the Virtual International Authority File (VIAF) URIs in upper right and center are identical, but the URI in the top right with the slash at the end refers to a foaf:Document (which serves the role of an authority record), while the center URI without the slash refers to a schema:Person.

2.2. Physical and Abstract Things as RWOs

The above examples demonstrate use of an RWO URI to represent a person, which likely reflects a commonsense understanding of RWOs since people have physicality, i.e., they are real, and are each unique.  However, just as we need a distinct referent to make RDF statements about a physical Thing and to relate different Authorities for the same person, we also need a referent to make RDF statements about intangible Things and to relate different Authorities for the same Concept.  During the ALA Midwinter 2017 review of 2017-DP01, the precedent for this proposal, some concern was raised about the relevance of RWO URIs for subject headings. This section attempts to clarify the issue by arguing that topical terms should also be modeled as RWOs.

According to the seminal Cool URIs document [https://www.w3.org/TR/cooluris/], RWOs include “abstract ideas and non-existing things like a mythical unicorn.”  In addition, the December 2016 document issued by the authors of the IFLA-sponsored Library Reference Model identifies the need for a class that “includes both material or physical things and conceptual objects. Everything considered relevant to the bibliographic universe, the universe of discourse in this case, is included.” Dpedia and Wikidata also create RWO graphs for canonical abstract Things. This recognition of concepts as RWOs from both the Web and library communities extends the realm of RWOs to topical subject headings. While the library community does not currently create RWO URIs for Conceptual Things, for fully controlled subject thesauri, such as MeSH and FAST, implementation of RWO URIs is a realizable possibility.

Consider, for example, the Dbpedia graph for the RWO concept 'Neoplasm' (See Note below). It is very rich and includes:

In a more library-centric ecosystem, if we were to create a fictitious RWO graph for the MeSH topical descriptor ‘Neoplasms’ it might include the following data describing it as a Thing:

Note: The Dbpedia RWO graph for Neoplasm can be viewed by plugging the RWO URI http://dbpedia.org/resource/Neoplasm in the URI box of the RDF Translator at http://rdf-translator.appspot.com/ and choosing an output serialization.

Hence, conceptual Things modeled as RWOs are really more similar to physical Things modeled as RWOs than may be readily apparent.

2.3. Current Use of $0 URIs in MARC 21 and Conversion to RDF

Libraries have a strong history of creating authority records and controlled lists of terms, and of adding identifiers from international agencies to bibliographic records (e.g., ISNI). The $0 was added to the MARC 21 Bibliographic format to capture the control number of the relevant authority record or standard number for individual fields. In the last few years, the phrase “standard number” has been interpreted to include URIs. The current definition of $0 does not differentiate between the URI for the Record or the URI for the Thing the record is about, so either type or both types of URI may be captured in a single MARC 21 field. Within the MARC 21 context, where semantics are less exacting, the difference has little or no impact until the MARC data is converted to other formats, such as RDF.

The URIs stored in $0 are ambiguous because they may refer either to Things or to records or documents about Things. That ambiguity makes it difficult for an automated conversion process to correctly parse the semantics of the URIs in RDF.  In the following example, the 100 field contains two $0s. The first, from LC/NACO NAF, records the URI for the authority record of the author Michelle Obama; the second, from VIAF, records a RWO URI referencing Michelle Obama as a Person (note: in VIAF, URIs without the trailing slash represent the RWO):

100 1# $a Obama, Michelle, $d 1964- $e author $0http://id.loc.gov/authorities/names/n2008054754 $0http://viaf.org/viaf/81404344

Ideally, the above MARC field should convert to two RDF triple statements:

<SomeWork> <wasAuthoredBy> <Michelle Obama> .
<Michelle Obama> <isDescribedBy> <an LC/NACO NAF authority record> .

In order to do so, a MARC to RDF converter would have to discern that the two $0 subfields contain URIs with different meanings and should be parsed into distinct, non-overlapping  RDF statements:

Hence a critical step in the process of preparing MARC 21 for RDF conversion is to designate a subfield that can be used throughout MARC 21 that contains URIs for Things, and is separate from a subfield with a similar distribution that captures URIs for Records or Authorities about the Thing. This paper proposes that $0 be used for Record or Authority URIs, while a newly defined $1 subfield should be used for Thing URIs. This was put forward at the ALA Midwinter meeting 2017 in 2017-DP01; MAC mostly supported the return of this discussion paper as a proposal.

2.4. Proposed strict interpretation of $0 for “Authority record control number or standard number”

The current definition of $0 is “Authority record control number or standard number.” The historic practice for adding URIs to $0s has focused primarily on URIs for Authority records, including the modeled RDF data discussed in this paper as well as the web document URLs that are not in its scope. To take advantage of increasingly sophisticated RDF datasets now being published in the library community, it makes strategic sense to limit the scope of $0 in the MARC 21 formats to store control numbers or standard numbers (including URIs) that refer to Records about Things, as defined above, keeping the definition as close to the traditional library authority files as possible. As a result, URIs appearing in $0 should provide access to strictly machine actionable or parseable data from Authority records, SKOS Concepts, and other Record-like entities. In other words, this paper proposes that the definition of $0 be restricted to exclude traditional document URLs (generally found in $u) and RWO/Thing URIs.

2.5. Proposed addition of $1 in parallel with $0

In parallel with the $0 strict interpretation described above, we propose the use of $1 to hold URIs that refer directly to a Thing or RWO (Person, Place, Thing, Concept, etc.), i.e., the actual Thing that is the focus of a $0 resource. For each of the MARC fields that provision URIs to link to Authority Records using $0, we propose also defining $1 to capture URIs for the corresponding RWO/Thing. However, $1 and $0 do not need to co-occur; they can appear singly or combined in a MARC 21 field. The $1 is appropriate for this function because it is currently undefined in MARC 21, and allows the freedom to easily add RWO URIs anywhere in MARC 21 format.

The following diagram illustrates how the semantic differences in RDF between Records and Things/RWOs would map to the $0/$1 proposal:

To summarize, two arguments motivate the distinction between Record and Thing URIs:

Nevertheless, the implementation of separate RWO/Thing and Authority Record URIs is still in flux. Two issues have been identified for further study:

Expanding the descriptions accessible from real-world-object URIs. Descriptions accessible from RWO URIs published by the Library of Congress and other institutions in the library community are populated with data extracted from library authority files. Thus, RWO descriptions for Persons typically contain lifespan dates, professions, and associated places or organizations, while the Authority descriptions contain a fuller range of headings and provenance information. However, many RWO URIs provide no information beyond the preferred heading, and even the fullest descriptions may be insufficient for merging, matching, and authoritative fact collection about the important entities recognized by the library community. They also fail to acknowledge the current activity devoted to advancing the models for entities beyond crosswalks from legacy standards, such as the model of organizational identifiers developed by ISNI.

RWOs in Authority descriptions. A solution for the 024 field defined in the LC/NACO NAF that is semantically consistent with the $0/$1 recommendation needs to be proposed. 024 fields are currently populated with the same range of document URLs, document URIs, RWO URIs, and legacy control codes that appear in $0 in bibliographic records. But this issue is beyond the scope of the current effort. We support the work of the Work URIs task force, which offers a partial solution in the DP submitted in the current review cycle.

3. EXAMPLES

In each of the examples below the $0s represent what we would consider Authorities, and $1s represent Things (RWOs). Repeated $0s represent different authorities describing the same Thing (RWO). Repeated $1s represent different URIs for the same Thing (RWO). As noted in Section 3 of 2017-DP01, the visual difference between the machine-readable URI and the human-readable URL is often very subtle and easily overlooked if one is not aware of it.

Bibliographic Format

Pattern: $0 {Authority URI} $1 {Thing URI}

100 1# $a Obama, Michelle, $d 1964- $0 http://id.loc.gov/authorities/names/n2008054754 $1 http://viaf.org/viaf/81404344

100 0# $a Santa Claus $0 http://id.loc.gov/authorities/names/no2015039717 $1 http://dbpedia.org/resource/Santa_Claus

700 1# $a Stipe, Michael, $d 1960- $0http://id.loc.gov/authorities/names/n91125827 $1 http://www.bbc.co.uk/things/3aeaa474-ad77-4eb0-a6ba-69f1af33b7f4#id

386 ## $a Filipinos $0 http://id.loc.gov/authorities/demographicTerms/dg2015060630 $1 http://www.wikidata.org/entity/Q4172847 $2 lcdgt

600 00 $a Zeus $c (Greek deity) $0http://id.loc.gov/authorities/names/no2014048635 $1 http://viaf.org/viaf/308237987

650 #0 $a Kindness $0 http://id.loc.gov/authorities/subjects/sh85072376 $1 http://dbpedia.org/resource/Kindness $1 http://www.wikidata.org/entity/Q488085

651 #0 $a Greenwich Village (New York, N.Y.) $0 http://id.loc.gov/authorities/names/n97020733 $1 http://vocab.getty.edu/tgn/7015857-place

830 #0 $a Oxford history of art. $0 http://id.loc.gov/authorities/names/n96099923 $1 http://viaf.org/viaf/184384669 $1 http://www.wikidata.org/entity/Q24039213

Authority Format

Pattern: $0 {Authority URI} $1 {Thing URI}

500 1# $i Founder: $a Jemison, Mae, $d 1956- $0 http://id.loc.gov/authorities/names/n95004729 $1 http://id.loc.gov/rwo/agents/n95004729

511 2# $i Successor: $a South African Book Fair $0 http://id.loc.gov/authorities/names/no2015047845 $1 http://viaf.org/viaf/315601042 $w r

370 ## $c New Zealand $0 http://id.loc.gov/authorities/names/n79021322 $1 http://sws.geonames.org/2186224 $2 naf

374 ## $a Astronauts $0 http://id.loc.gov/authorities/subjects/sh85008988 $1 http://dbpedia.org/resource/Astronaut $2 lcsh

377 ## $a bul $0 http://id.loc.gov/vocabulary/languages/bul $1 http://lexvo.org/id/iso639-3/bul $1 http://glottolog.org/resource/languoid/id/bulg1262 $1 http://dbpedia.org/resource/Bulgarian_language

380 ## $a Horror films $0 http://id.loc.gov/authorities/genreForms/gf2011026321 $1 http://www.wikidata.org/entity/Q200092 $2 lcgft

750 #7 $a Kindness $0 http://id.loc.gov/authorities/subjects/sh85072376 $1 http://dbpedia.org/resource/Kindness $1 http://www.wikidata.org/entity/Q488085 $2 lcsh

Note: Currently, MARC-based authority files do not tend to make use of  the 7XX to link to external vocabularies, but 7XXs with $0s could prove useful when an authority file wants to link to LCSH, FAST, MeSh, etc. For example, a local authority file may be motivated to create a local entity, but also want to link to an already existing term in an established vocabulary.

4. BIBFRAME DISCUSSION

The recommendations proposed in this document will facilitate the conversion of MARC 21 records to BIBFRAME. The distinction between Thing and Authority URIs is consistent with the BIBFRAME 2.0 model.

5. PROPOSED CHANGES

In the MARC 21 Formats, define subfield $1 as follows:

$1 - Real-World Object URI
Subfield $1 contains a HTTP URI identifying an entity modelled as a Real-World Object.


HOME >> MARC Development >> Proposals List

The Library of Congress >> Especially for Librarians and Archivists >> Standards
( 08/07/2017 )
Legal | External Link Disclaimer Contact Us