Proposal 2008-05/1: Encoding RDA: Introduction and Principles
1. BACKGROUND
At the Midwinter 2008 MARBI Meetings the MARC Advisory Committee considered Discussion Paper No. 2008-DP04, which introduced issues related to encoding data formulated according to Resource Description and Access (RDA) in MARC records. Prepared by members of the Joint Steering Committee (JSC), the paper listed a number of areas where changes might be needed to the formats to accommodate RDA data, and participants at the meeting suggested directions for future proposed changes to MARC.
RDA provides a set of guidelines and instructions on formulating data reflecting entity attributes and relationships to support the FRBR user tasks (find, identify, select, and obtain) and FRAD user tasks (find, identify, contextualize, and justify) for resource discovery. RDA is aligned to the FRBR and FRAD conceptual models and strives for the provision of consistent metadata across all types of resources. RDA also functions as a metadata element set, similar to the Dublin Core Metadata Element Set. In addition, the data created using RDA is designed to function within a range of database structures (from "flat file" through to relational/object oriented).
In March 2008, under the auspices of the Library of Congress, the Library and Archives Canada, and the British Library, an RDA/MARC Working Group was established to collaborate on the development of proposals for changes to the MARC 21 formats to accommodate the encoding of RDA data. With the implementation of RDA anticipated for late 2009, the Working Group was charged with drafting proposals for review and discussion by the MARC community in June 2008. During the period, the Working group found that parts of RDA that it needed to work with were undergoing revision and were not available to the Working Group. It concluded that it could prepare some proposals based on parts of RDA but that for some areas only Discussion Papers could be developed. The expectation is that the Discussion Papers can become Proposals, if warranted from the discussions, for the January 2009 meetings. The committee’s work has resulted in a series of proposals and discussion papers to be considered by the MARC Advisory Committee in June 2008.
The previous paper, 2008-DP04 lists key background documents that have been used in developing these papers. Consult that source for further information.
Proposal No. 2008-05 (Changes to the MARC 21 Formats to Encode Data using Resource Description and Access (RDA)) consists of a set of proposals suggesting changes to the MARC 21 formats with the following parts:
- 2008-05/1: Encoding RDA: Introduction and Principles
- 2008-05/2: Identifying work and expression records in the MARC 21 Bibliographic and Authority formats
- 2008-05/3: New content designation for RDA elements Content type, Media type, Carrier type
- 2008-05/4: Enhancing field 502 (Dissertation note) in the MARC 21 Bibliographic Format
Discussion Paper No. 2008-DP05 discusses issues concerning accommodating RDA data in MARC 21 in cases where the direction to be taken is not clear and further discussion is needed by the community in order to formulate change proposals. It consists of the following:
- 2008-DP05/1: Using RDA relators between names and resources with MARC 21 records
- 2008-DP05/2: New data elements in the MARC 21 Authority Format
- 2008-DP05/3: Treatment of controlled lists of terms and coded data in RDA and MARC 21
- 2008-DP05/4: Items not requiring MARC 21 format changes for RDA
There are several issues that are not being considered at this time because the RDA developers had not yet made the information available to the RDA/MARC Working Group.
2. RDA and MARC Database models
2.1 Content type
RDA was reorganized in October 2007 to attempt to better facilitate using some specific relational database models that also underlie the FRBR model. These models may be summarized as follows (scenario numbers from RDA):
- Scenario 1. System identifies data according to the FRBR model and may store the components of a bibliographic description in a number of (or a few, depending on the design) linked files. It requires that the access point data not be stored in the bibliographic resource descriptions, for example, but be pointed to by the components of a description.
- Scenario 2. Records for bibliographic data are supported by authority and holdings records. The access points in the bibliographic records are linked to the authority records via the heading strings and record numbers and these links are exploited by the system.
- Scenario 3: Records for bibliographic data which may or may not be supported by records for access points (authority records). This system does not act on or employ links between the bibliographic and authority records. Although the linking data is in the records in the form of access point strings, and record numbers may also be included, the database system does not use them.
These system hypotheses have all been used in various ways over the years of catalog automation. The well-known “Quadraplanar structure” described by staff at the University of Chicago in the early 1970s and implemented in the Washington Library Network system in the late 1970s were early examples of exploiting relational design to facilitate more efficient and accurate cataloging of resources. Attempts to take that design into ILSs were not highly successful although the WLN system was ported to an early relational database management system and the Dobis/Libis system and even the Amicus system used at the Library and Archives Canada tapped relational database attributes by interrelating record components. The problems encountered were probably based on machine capabilities that did not rapidly support retrieval of records at the time. The emphasis on rapid retrieval of large result sets steered most ILS vendors away from deconstructed records that needed to be reassembled on-the-fly for display. Today many of the ILSs are based on relational databases but the relational attributes are used for different purposes.
The speed of computer systems and the sophistication of relational database management systems like Oracle today, however, enable enterprise systems designs able to exploit the FRBR model to make search and retrieval results more consistent, especially since major explorations are taking place with the retrieval side of the ILS that reconfigures data for new browse, search and presentation demands.
MARC, however, is a communications format and it needed to exchange encoded data between systems with different database management systems and widely different ingest and use capabilities. This is why the MARC record is rich with data that can feed local configurations and idiosyncrasies. The difference between an internal (enterprise-wide) format and a communications format has become obscured by the fact that internal systems found it easier to program a record ingest that stored records that looked a lot like the transport format, at least as far as the parsing of data was concerned. However, most internal record formats that are called “MARC” are not communications MARC but are just MARC compatible. Since there are other database options other than relational (e.g. XML native), MARC communications needs to support:
- 1) Resource description records that are complete with information a receiving system may need to present to the end user (the receiving system often does not want to actually store the record in a local database).
- 2) It may also include information that links the fields to access point information in external records, in addition to including the access point data in the exchange record.
- 3) It should include if possible information that will support ingest into a variety of database management system types.
The developers of RDA believe that use of scenario 1 for enterprise systems is a future direction which offers a more rigorous basis for representing FRBR entities and relationships. Attributes of each FRBR entity would be stored in separate records and all relationships could be made directly between the records for the related entities. To realize the benefits of an enterprise implementation of scenario 1 will be complex and require the collaboration of RDA developers, designers of integrated systems, and MARC developers (to assure the exchange format carries sufficient data to support the database design) and as such, is considered a long-term goal.
3. WORKING PRINCIPLES CONSIDERED
The RDA/MARC Working Group established some principles to assist in deciding what changes are necessary to MARC to accommodate RDA-encoded bibliographic records. These fell into the following categories.
3.1. FRBR entities and RDA data
A diverse community uses the MARC 21 formats and not all will use RDA for its content, just as some users have not used AACR2 in the past. Some may have different underlying models than FRBR and FRAD. For these reasons, the RDA/MARC Working Group established the following principles in mapping RDA to MARC 21:
- MARC 21 needs to maintain neutrality and flexibility concerning which types of records (authority, bibliographic, or holdings), fields, or subfields map to which FRBR entities: work, expression, manifestation, item.
- MARC 21 needs to be neutral and flexible with respect to database designs that ingest MARC data but it should enable enterprise use of any of the three database designs in RDA.
3.2. Parsing of data
The MARC Formats: Background and Principles was established in the 1980s and last updated in 1996. It includes some principles for instances where specific content designation is needed in textual fields:
- Categorical indexing or retrieval is required for the data. The data is used for structured access purposes but does not have the nature of a controlled access point.
- Special manipulation of a specific category of data is a routine requirement. Such manipulation includes special print or display formatting or selection or suppression from display or printed product.
- Specialized structuring of information for reasons other than those given above is needed. For example, support for particular standards of data content.
These principles were considered in looking at the granularity of data in RDA and how much should be included in MARC. The Working Group and the original developer of the above principles recognized that parsing of data at point of data entry can increase cataloging costs, so it should be weighed against enhanced end user experience.
In some cases, RDA appears to specify parsing and identification of data at a level finer than AACR2. If parsing of data is important then either new parsed fields may be established or a mechanism to encode both basic and enhanced data in the same field can be utilized.
The latter technique was established when considering Proposal 92-15 which addressed a similar problem with enhancing field 505 (Formatted contents note). The field originally contained a simple note with author and title information that could be indexed and displayed but did not facilitate special processing. The solution was to define subfields for a parsed version of the note, distinguishing the simple and parsed versions of the field via an indicator value to indicate “basic note” and “enhanced note.” Thus, both simple and parsed fields are possible but detailed parsing is not required, supporting a variety of needs. This technique has been used in the treatment of additional subfields for dissertations (Proposal No. 2008-05/4).
The Working Group did not encounter other situations where a different technique needed to be considered, since it also took into account the general MARC principles listed above in this section that have been used for many years.
Related to the parsing of data is its identification for labeling in user displays. MARC has a number of fields with display constant controller indicators and/or display text subfields, but other fields do not have these special content designators. Rather than adding a lot of content designation to the format, the Working Group agreed that a general principle could be applied that display labels can be system-generated based on a field tag or a subfield code, as is common already. This enables display information to be generated based on local preferences and in different languages.
3.3. Coded values
MARC contains a number of coded value lists, particularly in the fixed fields. They were developed over time to address needs of the community. Discussion Paper No. 2008-DP05/3 discusses the treatment of controlled term lists in RDA, and their mapping to MARC coded value lists. In many cases the granularity in MARC was compatible with that in RDA, however.
The Working Group took the approach that since the RDA term lists are open ended and not meant to be closed and complete, exactly aligning MARC with those lists, which will change as RDA and technology develop, would often not be practical. It would negatively affect consistent retrieval of MARC records and also the flexibility of RDA in the future.
3.4. Items not requiring MARC changes
The RDA/MARC Working Group considered all items presented in the earlier Discussion Paper No. 2008-DP04 and upon further detailed investigations concluded that in some cases changes suggested in that paper were not needed to the formats. This was mainly due to applying the principles discussed here as well as new information that became available from the RDA developers. In many cases there were already existing fields where the data could go. These decisions are documented in the paper Discussion Paper No. 2008-DP05/4.