The Library of Congress >> Especially for Librarians and Archivists >> Standards
MARC Standards
MARC 21 HOME >> Bibliographic Format >> Overlaps Between MARC21 >> MARC21 to MIX Mapping

Analysis of Data Overlap Between the MARC21 Bibliographic Format

and the MIX Data Dictionary (ANSI/NISO Z39.87-2006)

INTRODUCTION

The present paper examines The MARC21 Bibliographic format and the MIX Data Dictionary for correspondences in the data which they accommodate.  Though the data dictionary is actually defined by ANSI/NISO Z39.87-2006, it will, for convenience, be called MIX throughout this report, employing the name of an important XML schema, maintained by the Library of Congress, for implementing the standard.

This analysis, conducted following a similar one relating elements of MARC21 to those of the PREMIS Data Dictionary, invites some comparisons of PREMIS and MIX and their relationships to MARC21. 

The PREMIS and MIX metadata schemes are similar in focusing on the management of collections of digital resources.  In MIX this orientation is made explicit by including in the documentation of each element a section categorizing anticipated use.  “User” use is attributed to only about one third of the elements; more than half of those are elements of GPSData.  The other two MIX categories, “system” and “manager” are each attributed to more than twice as many elements as for the “user.”  In contrast, MARC has throughout its history been centered on “user” metadata.

PREMIS concentrates on data relevant to managing preservation operations, organizing the data into various related entities (Object, Event, Agent, Rights.)  The scope of MIX is more limited in the materials it covers, still raster images, but provides for description of these objects in significantly greater technical detail.

PREMIS relies heavily on the use of string datatype values from controlled vocabularies defined externally from the scheme itself.  MIX, on the other hand, employs a variety of datatypes.  In addition to using string datatype values from "enumerated" lists, which represent roughly one third of all datatype options in MIX and which are akin to PREMIS's use of string values from controlled vocabularies, MIX also relies on the simple string datatype, which is the second most common datatype in MIX, and several numeric datatypes, such as the "positive integer" datatype, the "non-negative real" datatype, and the "rational" datatype.

The string data type and the various numeric types represent potentially better correspondence between the ways MIX and MARC21 record data than is the case between PREMIS and MARC21, specifically with reference to the latter’s use of free text notes.  (Whether the apparent preference in MIX for text values over codes in elements of the enumerated type is helpful in this regard is harder to evaluate.)  However, most of the data collected by MIX are of a nature so technical and specialized that only rarely have provisions been made for them in MARC21.        

Findings of this investigation are expressed in a pair of tables that have the same contents arranged differently.  Table MARC21 vs. MIX is ordered by MARC tag and subfield code; table MIX vs. MARC21 follows the order of MIX data elements of the Data Dictionary as presented in Z39.87-2006.  Though the current version (2.0) of the MIX schema includes the conversion of a few of the Z39.87-2006 elements to Containers with new, more specific elements defined within them, these elaborations occur only in areas that are not accommodated in the MARC21 format; hence the findings are equally applicable to MIX versions 1.0 and 2.0.

NOTES ON THE ANALYSIS

Certain elements are noted in MIX documentation as having been “drawn from the PREMIS data element set.”  In the tables, these elements are flagged by an asterisk following their numeric identifiers.

Where feasible, the tables work at MARC21 subfield and MIX element levels.  Departures from this principle occur for two main reasons.

  1. There are situations in which a single MARC21 field or subfield represents, implicitly or explicitly, a Type/Value pair of MIX elements.  For succinctness, a single reference to the appropriate container is used in these cases.
  2. Sometimes the specification of a MARC21 subfield relates in an uncertain or complex way to the elements of a MIX Container.  Creating a table entry at the Container level expresses the relationship as clearly as seems possible; viz, fields 300$c and 534$e.

The tables have been built assuming that it is preferable to identify possible correspondences that may be tenuous, or even erroneous, than to omit a connection of importance inadvertently.  Users of these tables who are more knowledgeable of their or other’s MIX implementations may be able to eliminate some table entries as universally inappropriate or too rare to be worthy of consideration.  For instance, the data in field 007 positions for computer files may be deemed useless in contrast to the detail afforded by MIX.

One situation is the relationship between MIX GPSData and MARC21 fields 034, 255 and 007 of the remote sensing image type (i.e., 007/01=r).  On the surface, it might seem like the MARC fields for recording Geospatial data are good fits with MIX GPSData,  but MIX's GPSData and MARC's 034 and 255 fields are not trying to record quite the same information. Typically, GPS data records the point in place where a digital photograph was recorded.  MARC 034 and 255 attempt to describe the geographic bounding of a (cartographic) image. 

It should be noted that particular MARC21/MIX correspondences depend on whether one contemplates a MARC record describing the same electronic object as a MIX object or one in single record style that essentially describes the Source of the MIX image.  The tables attempt to cover both cases.

The process of deciding what kinds of data are expected in any subfield of interest, particularly in free text note fields, was guided by the examples provided in the full online MARC21 Bibliographic Format document.  No attempt was made to search for additional examples in library databases.  The many MIX elements for which no MARC21 homes have been identified could, of course, be included in general notes tagged 500.  This seems unlikely, however, because the 500 note is typically displayed in library catalogs while the technical character of most MIX elements would rarely be need to be exposed in this way.  

The investigator is aware of his tendency to think of metadata comparisons of this sort in terms of conversions from one scheme to another.  That is a matter more limited than the general question of data overlap.  Nevertheless, that mindset may have crept occasionally into the language used to talk about the relationships of certain data elements between MARC21 and MIX.  This should not seriously affect interpretation of the findings.

Prepared for the Library of Congress
by Charles W. Husbands
4 January 2010


MARC 21 HOME >> Bibliographic Format >> Overlaps Between MARC21 >> MARC21 to MIX Mapping

The Library of Congress >> Especially for Librarians and Archivists >> Standards
( 12/20/2010 )
Contact Us