NAME: Redefinition of Code "m" (Computer file) in Leader/06 in the USMARC Bibliographic Format
SOURCE: Library of Congress
SUMMARY: This paper explores the redefinition of code "m" in Leader/06 (Type of record) to code electronic items for content rather than carrier. It reviews previous discussions of this issue and considers how the proposed redefinition might affect the use of fields 008 and 006. Two options are proposed: one to redefine code "m" to include software, numeric data or a mixture of forms, and the other to redefine as just computer software. In addition, it proposes opening up the definition of Mixed materials (code "p") to include electronic mixed materials, and to change the name of the Books 008 to "Textual (Nonserial)".
KEYWORDS: Leader/06 (Bibliographic); Type of record; Electronic reproductions; Computer files; Mixed materials
RELATED: 97-7 (February 1997); DP97 (July 1996); DP92 (January 1996); 95-9 (June 1995)
STATUS/COMMENTS:
12/15/96 - Forwarded to USMARC Advisory Group for discussion at the 1997 Midwinter MARBI meetings.
2/20/97 - Results of USMARC Advisory Group discussion - Deferred. Some participants felt that the proposal was not well enough developed to move forward with. LC should come back with another proposal that address OCLC concerns: 1) that the choice in Leader/06 should be clear 2) that guidelines need to be consistently applied from one environment to the next, and 3) that there be somewhere in the record to determine whether the record describes an electronic or nonelectronic version. The latter requirement needs to be in a mandatory field, not optional field such as 006 or 007. LC should come back with another proposal for the summer meeting that reflect these concerns and that include examples for all types of materials.
2/26/97 - Results of final LC review - Agreed with the MARBI decision.
PROPOSAL NO. 97-3: Redefinition of Code "m" (Computer file) in Leader/06 1. BACKGROUND With the completion of the last phase of Format Integration in early 1996 MARC bibliographic records may contain coding for more than one set of characteristics in the new field 006 (Fixed-Length Data Elements--Additional Material Characteristics). Leader/06 (Type of record) contains a code that is used to determine what type of 008 (Fixed-Length Data Elements) is included in the record; the 008 character positions vary in 008/18-34 depending upon the type of material as coded in the Leader. Field 006 includes applicable codes that would otherwise be coded in 008/18-34, so that additional information may be given for other additional aspects of the item. A choice must be made as to which form of material the field 008 should be coded for. In terms of description, the decision as to which form is primary and which is secondary does not have much impact, since all characteristics can also be given in the record. However, the Leader/06 code is used for many purposes, particularly for retrieval of records. Format Integration opens up the opportunity to supply more information about the item than in the past, but it also brings up many questions about how to apply this new flexibility. In our current environment, distinctions between types of material have become blurred. With the advent of the personal computer and the growth of the Internet it becomes questionable whether categorizing all digital material as a computer file is useful for retrieval and manipulation of bibliographic records. If all digital material were coded as a computer file the record for a computerized version of, for example, an original photograph will be coded differently than the record for the original (if separate records are created). This may cause problems for retrieval, particularly in systems that separate records by form of material. Also, because of economic considerations, many users are adding information about the digital item on the MARC record for the original, rather than creating a separate record. The coding in Leader/06 was discussed on three previous occasions at meetings of the USMARC Advisory Group. Proposal No. 95-9 (Encoding of Digital Maps in the USMARC Bibliographic Format) was considered by the USMARC Advisory Group in June 1995. It proposed renaming code "e" in Leader/06 from "Printed map" to "Cartographic material" so that all maps, whether digital or print, could be coded the same (there is also a code for "manuscript map"). Because of the increasing number of digital map images becoming available (resulting partly from digital library projects and the Content Standards for Geospatial Metadata), this change was considered necessary for the map community. In many cases the bibliographic record for the paper copy will contain information about the location of the digital image. This paper brought up the issue of coding for content rather than for physical carrier. Although the portion of the proposal concerning Leader/06 was approved, it was suggested that a broader discussion paper be presented. Discussion Paper No. 92 was presented to the USMARC Advisory Group in January 1996. It explored changing the definition of code "m" in Leader/06 so that it is used only for executable software. There was general agreement that in cases where the content of the electronic material is clear, that identifying the primary record type in the Leader/06 by its content rather than carrier better served users. These cases include electronic text, music CDs, digital maps, digital photographs, etc. Participants felt that to define code "m" as only executable software was too restrictive for various reasons: the growing existence of hybrids which include pictures, graphics, text, software; files that don't fit into a category, e.g. survey data; and, defining "m" only as executable software would not allow input of an 006 for computer file characteristics for electronic text, since its secondary characteristic is not an executable. The group thought that it was likely that each constituency would need to issue guidelines. Another discussion paper was requested for the USMARC Advisory Group meeting in July. Discussion Paper No. 97 was presented in July 1996. The consensus of the group was that it was desirable to change the definition of code m so that one does not have to code everything digital that way. It was requested that a proposal be written to include a redefinition of code m with the coding of Leader/06 for digital items dependent upon the content of the item, rather than how it is represented. It was suggested that two options be presented, one for code m to include executables, data sets, and raw data and another more narrow one just for executables. The group also recommended that code o (Kit) be considered for multimedia or that its definition be clarified to distinguish it. 2. LEADER/06 Earlier discussions detailed the many uses systems make of the Leader/06. Some of these are: separating databases based on form of material; sorting records; matching records for duplicate detection; selecting subsets for products distributed. Because of the many uses of this coded element, the decision as to which characteristic to consider primary has great impact for retrieval and manipulation of the record. The 006 can give the additional descriptive information for the secondary form of material, although not all systems are currently using it for retrieval the same way as 008 is used. The current definition of Leader/06 in the USMARC Format for Bibliographic Data, code "m" for Computer file is: m - Computer file Code m indicates that the content of the record is for a body of information encoded in a manner which allows it to be processed by a computer. The information in the computer file may be numeric or textual data, computer software, or a combination of these types. Although a file may be stored on a variety of media (such as magnetic tape or disk, punched cards, or optical character recognition font documents), the file itself is independent of the medium on which it is stored. This definition implies that any electronic item needs to be identified in the Leader/06 as a Computer file. It also implies that a separate record would need to be made for a electronic reproduction, since the record for the original would be coded according to the original carrier/content. Opening up this definition so that the institution is not mandated to categorize all electronic items as computer files allows for more flexibility and was considered generally desirable in previous discussions of this issue. 3. CODING FOR CONTENT Technological advances have resulted in numerous projects to digitize existing material. Librarians want to provide descriptive information and access to these materials through catalog records. With the definition of Field 856 (Electronic Location and Access), a bibliographic record can be created with a link to the electronic location of the item. Subfield $3 (Materials specified) allows for electronic location information to be given for a subset of the item in the record. In special collections, particularly in visual materials, institutions have often chosen to include field 856 in the record for the original to give information about the electronic item. In these cases, additional descriptive information about the item in its electronic form has not been needed, but only information about the location and access to it. This technique has been attractive because of the retrieval problems when the electronic item is cataloged separately as a computer file, and the economic considerations of creating a separate record. It allows for the focus of the record to be on the content of the item, rather than its carrier. Coding for the content of the item for electronic materials would be consistent with the method used for handling microforms. The USMARC bibliographic format in Leader/06 says the following: "Microforms, whether original or reproductions, are not identified by a distinctive Type of record code. The type of material characteristics described by the codes take precedence over the microform characteristics of the item." There is not a separate Leader/06 code for microform, although there is one for computer file. By handling electronic items in Leader/06 as language, cartographic, music, etc., records for digital reproductions would not be separated from the originals, and there would be flexibility for record creation (i.e., using one record and adding a field 856 with location information of the digital reproduction or creating a separate record). The statement above about the treatment of microforms could be revised to include electronic materials. 4. REVISED DEFINITION OF CODE "M " In order to allow for records for digital items to be coded for their content, or intent, rather than as a computer file, the following definition might be considered: m - Computer file Code m indicates that the content of the record is for information that is processed by a computer and whose most significant aspect does not fall into any other Leader/06 category, i.e., the computer file characteristics are the most significant aspect of the item. Computer files that fall under this category include numeric data, computer software, a combination of these types, or a mixture of various Leader/06 categories exclusive of categories o (Kit) and p (Mixed materials), none of which predominates. Although a file may be stored on a variety of media, the file itself is independent of the medium on which it is stored. In case of doubt, consider a computer file. This definition does not mandate whether or not to use a separate record for the electronic item, but leaves it up to the cataloger. Alternatively, a narrower definition might be considered: m - Computer file Code m indicates that the content of the record for is one or more software programs. This category includes executable software, source code, etc. Any other type of computer- readable file is coded for the most significant form of material observable when processed for display, e.g., files that are primarily textual or numeric are treated as language material. If there are two or more forms and they are judged significant, use code p for mixed materials. See below for a discussion of code p. After format integration, since all variable fields are available for all types of material, the Leader/06 code no longer determines field validity. Once the electronic aspects are moved from the Leader/06, then the format issues can be divorced from the cataloging rules. Since field 006 can give characteristics of a second form of material, the choice of code in Leader/06 is not dependent upon the choice of AACR2 chapter used. The cataloger still needs to choose which chapter of AACR2 is appropriate for description, which will determine, among other things, which fields are needed in the record. The cataloger is no longer constrained by the format, since any USMARC defined fields will be valid. 5. Fields 006 and 007 Additional information about the computer file aspects may be given in field 006 and/or in field 007. These may be used by systems for limiting or sorting records. Field 006 is used for additional bibliographic characteristics, while field 007 is used for physical characteristics (particularly the specific medium designation (SMD)). A general material designator (GMD) may be given in 245$h to indicate that the physical format is computer file (although this information is optional); it is not necessary for the 008 to agree with the GMD, so the 008 for the content of the item would be given. The 008 character positions specific to computer files (and consequently their 006) contains three defined elements: target audience (CF008/22, also defined in Books, Music, and Visual Materials); type of computer file (CF008/26); and government publication (CF008/28; also defined in Books, Maps, Serials, and Visual materials). If the record were coded for content in Leader/06 and 008, a computer file 006 may be added. However, generally only the type of computer file would be useful, and in many cases it would be redundant with the value in Leader/06. For example, if the record were for text and coded "a" in Leader/06, adding "d" for document in 006/09 (the same as 008/26) would be redundant. Thus, there may be no additional information to include in the 006 (although some systems currently need the first character position for searching or limiting by material type). The physical characteristics, however, might be more useful and may be included in the computer files 007, such as the specific material designation (e.g., remote). Note that if the narrower definition for code m were used (i.e., executable software only), an 006 could not be added for many electronic items, since they may not include executables as a secondary characteristic. In this case, an 007 could provide added information and indicate that the carrier of the item is a computer file. Attachment A shows how each type of computer file for which there is a definition in 008/26 would be coded under both options if code m were redefined to emphasize content rather than carrier. 6. Multimedia items The question arose in the previous discussions of this issue as to how to code multimedia items which are becoming increasingly prevalent. In these cases several kinds of material interact and it is impossible to determine predominance. If the narrower definition of code m is used, a code will need to be found for these items. If the broader definition is used, they could be included in code m, although other options might be considered. When Discussion Paper No. 97 was discussed, it was suggested that code o (Kit) be considered for multimedia items and perhaps the definition of it could be broadened. However, since items that use code o use the 008 for Visual materials, this solution would require changing the 008 for kit, which may not be desirable. Broadening the definition of code p (Mixed material) might be considered as an option. At a meeting of U.S. and Canadian archivists in Toronto in October, the archival community expressed a willingness to broaden the definition of Mixed material to accommodate mixtures other than those that are archival in nature, since they preferred using the Leader/08 to bring out archival material. The definition of code p is as follows: p - Mixed materials Code p indicates that the content of the record is for two or more types of material that are usually related by virtue of their having been accumulated by or about a person or body. No one type of material in the group is emphasized or predominates. The intended primary purpose is other than for instructional purposes (i.e., other than the purpose of those materials coded as o (Kit)). This category includes archival and manuscript collections of mixed types of materials, such as textual materials, photographs, and ephemera. Note that Proposal No. 97-7 (Coding Leader/06 and Leader/08 for archival material) considers changing this definition to clarify its use further for archival material. It might also be revised so that it is not limited to "material that are usually related by virtue of their having been accumulated by or about a person or body". The definition could also include multimedia works intended to be processed by a computer. The Mixed Materials 008/18-34 contains only one value, Form of item. It may not provide particularly useful information, but could be enhanced if there were additional characteristics of importance for these types of materials. 7. Impact of changes on systems Since some systems physically separate records by the type of 008, changing the definition of code m will have an impact on systems. For digital materials now, the 008 currently would be computer files for all but maps, unless a separate record had not been created and field 856 added to the record for the original Even if separation of record types is not an issue, integrated systems would have many records with computer files 008s that should have other 008s. In RLIN, textual computer file serials are in a computer files file with a computer file 008 and a serial 006. It may be necessary to move these. Non-textual serials that would no longer classify as computer files, may also need to be moved. If the narrower definition of code m were chosen, then a computer file 006 would no longer be appropriate for much electronic material which would be coded for content, since code m would be restricted to executable software. In this case, the computer file 007 would have to be used to code computer file characteristics and used by the system for searching by material type. RLIN already uses the CF 007 as one of the characteristics examined when qualifying a search by material type, but OCLC and WLN do not currently use the CF 007 for sorting and limiting searches. If this proposal is approved, the latter systems would need to make system changes if it is deemed desirable to identify this characteristic in some way for its retrieval potential. OCLC uses the Leader/06 for duplicate detection (as may other systems); in previous discussions of this issue the difficulty in doing duplicate detection on an optional field (006) was noted. If this proposal is approved, OCLC would probably need to make system changes to no longer use Leader/06 for this purpose. Other system impacts might be illustrated in Attachment A, which shows how each type of computer file might be coded in Leader/06, field 008 and 006 under each option. 8. Label for Books 008/006 If this proposal is approved, electronic textual material will fall under Leader/06 value a, now called "language material". The USMARC bibliographic format specifies that these materials will determine the 008 used by the value in Leader/07 (Bibliographic level); if a monograph it uses code "m" and the Books 008 and if a serial it uses code "s" and the Serials 008. Thus, monographic electronic textual material will require a Books 008. It may be desirable to change the label of the 008 from Books to "Textual (Nonserial)". Systems often use the term specified in the USMARC documentation to show the contents of the field, and a label "Books" could be very confusing to the user if the item is electronic text. Note that this issue also has arisen from concerns by the archival community; see Proposal No. 97-7 (Coding Leader/06 and Leader/08 for Archival Material). It might also be considered whether the name of Leader/06 might be changed from "Language material" to "Textual material", since videorecordings and nonmusical sound recordings might also be considered language material. 9. Questions for further consideration: 1. How will code p be distinguished from code o in Leader/06 if Option 2 or 3 is chosen? Multimedia items previously considered computer files may be intended for instructional purposes. 2. Should additional elements be defined for the Mixed Materials 008 to include characteristics of multimedia items? 10. PROPOSED CHANGES The following is presented for consideration: * In the USMARC Bibliographic Format, change the definition of code "m" as follows: Option 1: m - Computer file Code m indicates that the content of the record is for information that is processed by a computer and whose most significant aspect does not fall into any other Leader/06 category, i.e., the computer file characteristics are the most significant aspect of the item. Computer files that fall under this category include numeric data, computer software, a combination of these types, or a mixture of various Leader/06 categories exclusive of categories o (Kit) and p (Mixed materials), none of which predominates. Although a file may be stored on a variety of media, the file itself is independent of the medium on which it is stored. In case of doubt, consider a computer file. Option 2: Change the definition of code m in Leader/06 to restrict it to executable software and change the definition of code p as follows (note that this definition includes changes also proposed in Proposal No. 97-7): m - Computer file Code m indicates that the content of the record is for one or more software programs. This category includes executable software, source code, etc. Any other type of computer-readable file is coded for the most significant form of material observable when processed for display, e.g., files that are primarily textual or numeric are treated as language material. If there are two or more forms and they are judged significant, use code p for mixed materials. p - Mixed materials Code p indicates that the content of the record is for a mixture of components from two or more of the other Type of Record categories defined for Leader/06 exclusive of category o (Kit), each of which is judged to be significant. This category includes archival fonds and manuscript collections of mixed forms of material, such as text, photographs, and sound recordings. It also includes computer-readable material such as multimedia works that include electronic text, images, sound, etc. Option 3: Option 2 but include numeric data in the definition of code m. * In the USMARC Bibliographic Format, redefine the 008 for Books to "Textual (Nonserial)" ------------------------------------------------------------------- ATTACHMENT A Types of computer files This chart shows each type of computer file for which there is a value in Computer files 008/26 (Type of computer file) and how each would be coded in Leader/06, 008 and 006 under the two options detailed in this proposal. In some cases, coding for 006 will give no further information than is in the 008, since the only character position in the CF008 that is different from the other material's 008 is 008/26. If this proposal is approved, the type would already be coded in leader/06 if coding the record for its content rather than its carrier. These items for which 006/09 (same as 008/26) information is redundant are identified as such in the 006 column. In the 006 columns "+ specifics" means that additional 006's may be added for other forms of material. Option 1 Option 2 Option 1 Option 2 008/26 type Ldr06/008 Ldr06/008 006 006 a Numeric m/CF a n.a. m b Computer m/CF m n.a. n.a. program c Representational k/VM k m (Redundant) n.a. d Document a/Bk or Ser a m (Redundant) n.a. e Bibliographic a a m n.a. or m* data f Font m m n.a. n.a. g Game m p m + specifics m h Sound i i m n.a. or m* i Interactive m p specifics m multimedia j Online system m p n.a. m or service m Combination m p n.a. m* For Option 3, everything is the same as Option 2 except Numeric data would be coded as m. *For Option 2, if an executable is part of the resource, then m could be used in 006. This would have to be determined.