The Library of Congress >> Especially for Librarians and Archivists >> Standards
MARC Standards
MARC 21 HOME >> Bibliographic Format >> Overlaps Between MARC21 >> MARC21 to TextMD Mapping

Analysis of Data Overlap Between the MARC21 Bibliographic Format

and the TextMD Data Element Set

INTRODUCTION

The present paper examines the MARC21 Bibliographic format and the TextMD element set for correspondences in the data which they accommodate.  TextMD’s area of concern is that of textual digital entities.  TextMD is introduced as having been developed as an extension schema for METS, which explains TextMD’s having no element that serves as an identifier of the object being described, an omission that makes the statement that the TextMD schema “could also exist as a standalone document” challenging to comprehend.

The precise number of TextMD elements is relatively small, the exact number depending on whether attributes are counted and on which version of the schema is being examined.  This analysis considered version 3.01, but is equally applicable to version 2.2 because the newly added elements have no discernable equivalents, or at least no specific homes, in MARC21.  

TextMD values are generally expressed as strings, although there is one (byte_size) that requires an integer, and the elements added for version 3.01 are described as “tokens” in the schema, where the enumerated values appear as strings.  Roughly three fourths of string values, so denominated, accept free text; the remainder are restricted to the values in enumerated lists.

Findings of this investigation are expressed in a pair of tables that have the same contents arranged differently.  Table MARC21 vs. TextMD is ordered by MARC tag and subfield code; table TextMD vs. MARC21 follows the order of MIX data elements as presented in its XML schema.  Data elements are not identified by numbers in the TextMD documentation, but have been assigned them here to facilitate sorting the tables.

NOTES ON THE ANALYSIS

Names of TextMD elements and attributes are italicized in these notes.

With the exception of fields and elements referring to the language of the text object, correspondences between MARC21 and TextMD are fuzzy.  The tables have been built assuming that it is preferable to identify possible correspondences that may be tenuous, or even erroneous, than to omit a connection of importance inadvertently.  Users of these tables who are more knowledgeable of their or other’s TextMD implementations may be able to eliminate some table entries as universally inappropriate or too rare to be worthy of consideration.

The MARC21 fields (041, 546) and TextMD elements (language, alt_language) concerning the language(s) of the text object are all repeatable, making it fairly easy to translate the information from one scheme to the other, having only to convert between MARC codes and ISO 639-2 in most cases, and to supply the necessary authority information in either scheme. When 041 and 546 are absent, the non-repeatable MARC21 element 008/35-37 can be used similarly for simple cases.

 Certain positions in the MARC 007 field of a computer file record may of limited use in formulating a TextMD QUALITY attribute, especially if that attribute is not expected to be any more precise than the single word “good” as shown in the example provided in TextMD documentation. 

MARC has a field (514) called Data Quality Note whose name suggests relevance to the TextMD QUALITY attribute, but the definitions of 514 subfields all point toward limiting the use of the field to cartographic resources, hence unhelpful for text objects.

The tables show MARC21 field 538 (System Details Note) as relating to TextMD containers (encoding, character_info) and distinct elements (printRequirements, viewingRequirements) on the basis of examples given in MARC documentation, but it seems thoroughly possible that information concerning other elements (e.g. markup_basis, markup_language) might also be encountered in 538.  Documentation of 538 does not distinguish between the computer requirements of the system that created the digital object (corresponding to TextMD encoding_platform) and those concerned with viewing or printing the object.

MARC21 Field 856 (Electronic Location and Access,) true to its title, is apt to offer little information that will fit in the TextMD encoding or character_info containers, possible exceptions being $c (Compression information.) as part of encoding_software, and $r (Settings) for determining byte_size, although the $r expression of the latter has begun to look quaint.     

On the other hand, 856 may contain, in various subfields, data appropriate to TextMD printRequirements and/or viewingRequirements inasmuch as those elements are specified to be free text strings.

The tables cite field 500 as the possible MARC21 home for data that would appear in the catch-all TextMD elements processingNote and textNote, but, depending on the content of individual instances of those elements, somewhat more specific 5xx notes may occasionally be more appropriate.

Data in new TextMD version 3.01 elements (e.g. representationSequence, lineLayout, lineOrientation, characterFlow) might also be reflected in somewhat different form in MARC21 note fields such as 500 and 546, particularly when the text of the object is written in non-Latin scripts.

The process of deciding what kinds of data are expected in any MARC21 subfield of interest, particularly in free text note fields, has been guided by the examples provided in the full online MARC21 Bibliographic Format document.  No attempt was made to search for additional examples in library databases.

The investigator is aware of his tendency to think of metadata comparisons of this sort in terms of conversions from one scheme to another.  That is a matter more limited than the general question of data overlap.  Nevertheless, that mindset may have crept occasionally into the language used to talk about the relationships of certain data elements between MARC21 and TextMD.  This should not seriously affect interpretation of the findings.

In carrying out this task, the investigator noted that re-examining some features may serve the scheme well in the long run.  Here are some things to consider.

Prepared for the Library of Congress
by Charles W. Husbands
21 January 2010


MARC 21 HOME >> Bibliographic Format >> Overlaps Between MARC21 >> MARC21 to TextMD Mapping

The Library of Congress >> Especially for Librarians and Archivists >> Standards
( 12/20/2010 )
Contact Us