The Library of Congress >> Especially for Librarians and Archivists >> Standards
MARC Standards
MARC 21 HOME >> Bibliographic Format >> Overlaps Between MARC21 >> MARC21 to PREMIS Mapping

Analysis of Data Overlap Between the MARC21 Bibliographic Format

and the PREMIS Data Dictionary

INTRODUCTION

The task of relating contents of MARC21 Bibliographic format and the PREMIS Data Dictionary confronts a variety of challenges.

MARC21 was designed primarily as an exchange format for bibliographic data pertaining to a wide variety of materials, a purpose requiring its specification to be understood and applied similarly by users in many libraries.  For some data elements – countries, languages, relators, etc. – code lists are provided that provide uniformity of content for several fields.  However, most fields accept free text, with issues of intelligible display frequently having priority over machine-processable uniformity   (The issue arose, for example, on the MARC Forum in a discussion about copyright jurisdiction in the proposal establishing field 542 (Information Relating to Copyright Status)).  Such consistency as is achieved in the free text fields results from application of a set of cataloging rules employed by the majority of MARC21 users.  These rules are distinct from the format, and, in an earlier form, antedated and influenced the creation of the MARC format.

In contrast, PREMIS is intended to facilitate the management of preservation metadata primarily within individual digital archives or repositories.  Data exchange has been a secondary concern.  The specification shows this clearly, allowing each PREMIS instantiation a great deal of latitude in what data are recorded, and in the vocabularies employed.  The phrase, “Value should be taken from a controlled vocabulary,” present as a data constraint for many PREMIS semantic units is indicative of this design philosophy, the implication being that the choice of vocabulary may be a local one. 

The result of these differences in purpose is that the two schemes focus, in their concern for particularity, on rather different metadata elements.  Even when the same metadata particular is represented in the two schemes, it is almost never done in a way that would allow the value to be transferred directly or unambiguously from PREMIS to MARC21, or vice versa.  MARC21 is at a disadvantage in the areas on which PREMIS concentrates, because much of the MARC specification was done when preservation of digital resources was a field in its infancy.  The PREMIS Data Dictionary embodies an attempt to codify metadata needs in this area where standards and best practices are only emerging.  There are no longstanding traditions to guide (or hamper) the effort.

NOTES ON THE ANALYSIS

The present paper examines these two metadata schemes for correspondences in the data which they accommodate.  The findings are expressed in a pair of tables that have the same contents arranged differently.  Table MARC21 vs. PREMIS is ordered by MARC tag and subfield code; table PREMIS vs. MARC21 follows the order of PREMIS semantic units specified in version 2.0 of the Data Dictionary. 

Where feasible, the tables work at MARC subfield and PREMIS subunit levels.  Departures from this principle occur for two main reasons.

  1. There are a number of situations in which a single MARC element represents, implicitly or explicitly, a Type/Value pair of PREMIS subunits.  For succinctness, a single reference to the appropriate container is used in these cases.
  2.  Sometimes the specification of a MARC field relates in an uncertain or complex way at the subfield level to the subunits of a PREMIS container.  Creating a table entry at the field/container level expresses the relationship as clearly as seems possible; viz, fields 506 and 542.

The tables have been built assuming that it is preferable to identify possible correspondences that may be tenuous, or even erroneous, than inadvertently to omit a connection of importance.  Two systematic exceptions from the policy of inclusiveness should be mentioned.

  1. Fields from bibliographic format ranges 1XX, 6XX, and 7XX-75X have not been included as potential PREMIS agents.
  2. Numbers based on knowledge classification (fields 050 et al.) have been excluded as possible linkingIntellectualEntityIdentifiers since their primary purpose is to bring materials together rather than to distinguish between them at levels useful for preservation management.

Users of these tables who are more knowledgeable of their and other's PREMIS implementations may be able to eliminate some table entries as universally inappropriate or too rare to be worthy of consideration.  For instance, subfield 260$f, manufacturer’s name, is listed as a possible agentName, although the likelihood of that relationship’s being useful seems small. 

Certain subfields whose roles in this context are obscure have generally been omitted.  Their evaluation will require examination of individual instances.

  1. Subfield $u when it contains a URI in fields other than 856; i.e, in notes 538, 540, 542, and 583.  In these cases the entity to which the URI points may provide data that might otherwise be present in one or more other subfields, though it is not possible abstractly to predict what data are to be found via the URI link.
  2. Subfield $3 “Materials specified.” can be significant in defining a specific electronic resource within the context of a complex MARC record, but its relation to PREMIS semantic units may be limited at most to construction of a linkingIntellectualEntity Identifier.

In general, there are no table entries for semantic units whose names begin with the element “linking,” because their type and value subunits inherit data from similarly named units lacking the “linking” prefix.  There are exceptions.

  1. Entries have been provided for linkingAgentRole (2.6.3 and 4.1.8.3) because role is not a semantic unit of an Agent entity.
  2. Entries have been provided for linkingIntellectualEntityIdentifier because there is no intellectualEntityIdentifier semantic unit in PREMIS. 

Finally, this analysis examined the question of MARC21 data that a repository might find appropriate to include in its PREMIS database, but that appear to fit only in the “Extension” containers.  Sometimes it is unclear whether these marginal data would be appropriate to “Note” subunits.  In the end, only the rightsExtension (4.2) appears in the tables; viz. fields 355 and 506.  

The process of deciding what kinds of data are expected in any subfield of interest, particularly in free text note fields, was guided by the examples provided in the full online MARC21 Bibliographic Format document.  No attempt was made to search for additional examples in library databases.

The investigator is aware of his tendency to think of metadata comparisons of this sort in terms of conversions from one scheme to another.  That is a matter more limited than the general question of data overlap, and one that the character of these two schemes renders impracticable.  Nevertheless, that mindset has crept occasionally into the language used to talk about the relationships between certain data elements and semantic units.  This fact should not seriously affect interpretation of the findings, but it seems wise to mention it.

Prepared for the Library of Congress
by Charles W. Husbands
14 December 2009


MARC 21 HOME >> Bibliographic Format >> Overlaps Between MARC21 >> MARC21 to PREMIS Mapping

The Library of Congress >> Especially for Librarians and Archivists >> Standards
( 12/20/2010 )
Contact Us