PROPOSAL NO: 98-16 R

DATE: May 1, 1998
REVISED: December 11, 1998

NAME: Nonfiling characters in all MARC formats

SOURCE: USMARC electronic list

SUMMARY: This proposal presents a more flexible and extensible technique for dealing with nonfiling characters that appear at the beginning of cataloging data in access fields in MARC records, including titles embedded in author/title fields. It proposes rules for identifying the nonfiling zone.

KEYWORDS: Nonfiling indicator

RELATED: DP102 (June 1997); 98-16 (June 1998)

STATUS/COMMENTS:

5/1/98 - Forwarded to the USMARC Advisory Committee for discussion at the June 1998 MARBI meetings.

6/29/98 - Results of USMARC Advisory Committee discussion - There was general consensus that the technique needed to be changed and the pair of control characters is the best solution. Reconciliation of the previous identifications of the nonfiling zones and a more precise new definition are needed. The situations to which the technique could and should be applied need to be specified, for example, only at the beginning of a field or subfield? internal to a subfield? There was a preference for limiting the technique to sorting, i.e., not to include indexing. The proposal should return at the next meeting with these additions. There was also consensus that implementation of this change will need considerable lead time.

7/29/98 - Results of LC/NLC review - Agreed with the MARBI decisions.

12/11/98 - Forwarded to the MARC Advisory Committee for discussion at the January 1999 MARBI meetings.

1/30/99 - Results of MARC Advisory Committee discussion - Approved new control characters (first change proposal). Making the current non-filing indicators obsolete and the rule for nonsorting zone were acceptable, but the limitation to specified subfields was questioned. Views expressed reversed those of the previous meeting and now support more unrestricted use of the non-sorting markers. With unrestricted use, precise rules (if possible) need to be developed that specify the types of words and the characters.

There was agreement that the use of control characters would not cross subfield boundaries. Additional work will focus on: application of the technique in any field; application anywhere in a field; specification of types of data and characters. CC:DA, SAC, and the Rare Book community will be asked to comment..

4/15/99 - Results of LC/NLC review - Agreed with the MARBI decisions.


PROPOSAL NO. 98-16: Nonfiling characters

1 BACKGROUND

MARC records are created with data elements to support the processing of the information in a variety of ways including online applications and various output products (e.g., catalog cards, book catalogs, and COM catalogs). Various products form sorted lists of access points for browsing. Online applications index access fields for titles, persons, corporate bodies, and subject terms to provide left match retrieval. Titles and names sometimes include parts of speech or other character strings at the beginning of the name or title parts of the heading that users and systems tend to ignore for sorting and retrieval, commonly called "nonfiling characters".

Proposal 98-16 was discussed in June 1998 with the results indicated above. Since there was consensus on the use of the special control characters technique, this revised proposal presents only that technique. This revision discusses more fully the nonfiling zone and rules for defining it and proposes a scope of use of the technique.

Terminology used in 98-16R:

1.1 Current Technique

The current MARC technique for identifying non-filing characters retained in records involves the use of an indicator position that carries a digit (0 through 9) representing the number of characters to be ignored. It is defined for Titles and Uniform Titles in the following fields:

Examples:
     240 14  $aThe Pickwick papers
     245 18  $aThe ... annual report of the Governor 
     245 12  $aL'enfant criminal
     245 13  $aal-Sharq as-'Arabi
It is also common for some types of introductory characters (diacritics or punctuation marks) not to be identified as nonfiling characters, with the expectation that they will be ignored by the software, and for others to be omitted by the cataloger.

Examples:
     130 0#  $a"Hsuan lai hsi kan" hsi liah
     245 10  $a[Diary]
     245 10  $a­as others see us
     240 1#  $aRosenkavalier (Opera) 
A nonfiling indicator was defined for Authority format fields X00 (Personal Name), X10 (Corporate Name), X11 (Meeting Name), X50 (Topical Terms), and X51 (Geographic Names) until it was made obsolete in 1993. The change was made because the X00, X10, X11, 650, and 651 fields in the Bibliographic format did not (and could not) have corresponding non-filing indicators, thus systems with authority control modules found the Authority format indicators had no practical use.

1.2 Need

A more flexible and extensible technique for identifying nonfiling strings is needed since the indicator technique is not available for all places where initial nonfiling information occurs and cataloging conventions for dropping information in those situations may not be acceptable. This is especially important since the format is used by communities that do not use the same cataloging rules and interpretations as the AACR2 community in the US, therefore their cataloging conventions may differ. Within a community, agreement to omit the article or character may be made, but there is a need to have a mechanism that can be used across all communities.

While generally, the handling of nonfiling characters by an indicator value works in those fields for which it is defined, this technique cannot be used in the following cases.
(1) When both indicator positions are already defined in a field. For example, titles recorded in field 246, like those in field 245 (Title Statement), sometimes have initial articles but both indicator positions are already defined for other uses. There is a need for a nonfiling technique for name and subject headings and the indicators are already used in some name and subject fields.
(2) Titles in author/title fields where the title part begins in the $t subfield (e.g., Bibliographic format 7XX fields).
(3) Titles of parts in subfield $p (Name of part/section of a work) of access fields.
(4) Title strings whose non-filing zone may be greater than 9 (e.g., Field 490 (Series Statement)).

Examples:
   Anwar al-Sadat
     700 1#  $aSadat, Anwar 
         [article omitted since both indicators used already]
   The Henry (Ship)
     610 1#  $aHenry (Ship) 
         [article omitted since both indicators used already]
   Stower, Caleb.  The printer's manual
     700 1#  $aStower, Caleb.$tPrinter's manual 
         [article omitted since no indicator available for $t subfield]
   N,N-Dimethyltryptamine
     650 12 $aN,N-Dimethyltryptamine
         [filed under N because both indicators used already, but
         scientists would expect it to be filed under D]
   Dissertation abstracts. A, The humanities and social sciences.
     245 00 $aDissertation abstracts.$nA,$pThe humanities and social
         sciences.
         [part title filed under T instead of h]
   1950-1968: Thesis abstract series
     490 1# $a1950-1968: Thesis abstract series
         [no filing indicator defined and filing zone exceeds 9
         characters]
1.3 June 1997 Discussion

This issue was first investigated in Discussion Paper 102. The following summarizes the discussion of the MARC Advisory Committee

There were different preferences for the technique to be used -- subfields, graphic characters, and control characters. Subfields would be easy to implement; graphic characters would be difficult to identify; and control characters are theoretically desirable but have some system drawbacks. There was a preference for two distinct characters, to be used before and after the nonfiling part. Several participants pointed out that the function under question is nonfiling, not non-indexing. For example, the English word "the" might not be indexed any place in a string but the characters we are identifying are the ones not indexed when they occur at the beginning of a string.

Gary Smith of OCLC proposed three properties that a new technique should have:
(1) It must not introduce any conflicts with existing data and thus must not use any code which has previously been assigned a graphic or control function in MARC;
(2) It must be location independent, i.e., it must be interpretable without knowledge of the identity or nature of the field in which it occurs.
(3) It must not require the conversion of existing records.

Other characteristics (from discussions) that apply if a graphic or control character technique is used:
(1) There should be two different characters for beginning and end.
(2) The characters should exist also in Unicode.

2 DISCUSSION

2.1 Characters considered

There are several types of initial characters that occur in bibliographic data that need to be considered, including initial definite and indefinite articles such as "the" and "a"/"an" in English and their foreign language counterparts

     A place like Alice
     al-Sadat, Anwar
     Der Spiegel
     La philosophie et le Québec
and diacritics; alphanumeric characters used in designated situations such as locants at the beginning of chemical names; special spacing characters, such as punctuation and other symbols; and spaces
     ... and then I said
     ¿Quién es quién en el Perú?
     16,16-Dimethylprostaglandin E2 
     N,N-Dimethyltryptamine
2.2 Consensus Approach: Special Control Characters

The Control characters from ISO 6630 for marking the beginning and end of the nonfiling zone, NON-SORTING CHARACTER(S), BEGIN (hex'88') and NON-SORTING CHARACTER(S), END (hex'89'), were the consensus approach from the June 1998 discussion.

Examples: The graphics { (for beginning of nonfiling zone) and } (for end of nonfiling zone) have been used in the following examples to represent the two CONTROL characters from ISO 6630.

     245 1#  $a{A }place like Alice
     700 1#  $a{al-}Sadat, Anwar
     245 1#  $a{¿}Quién es quién en el Perú?
     650 #2  $a{N,N-}Dimethyltryptamine
     700 1#  $aStower, Caleb.$t{The }printer's manual
     245 00  $aDissertation abstracts.$nA,$p{The }humanities and
             social sciences.
With reference to the properties listed in 1.3 above:

The nonfiling indicators would become obsolete, but that would not mean they needed to be changed in existing records. It is unrealistic to expect that all records will be changed retrospectively. A particular environment might want, however, to process records being stored into the new technique. Particular problems would occur if a system had linked authority and bibliographic records. There would be a need to coordinate the heading fields across the records, probably bringing pressure to retrospectively convert. Any heading matching routines that did not ignore the indicator would need to be adjusted.

Some disadvantages were also noted: 1) special characters require system implementation that affects hardware and software, 2) data has extraneous characters that must be ignored in displays and printed output.

2.3 Identificaton of the Nonfiling Zone

Rules for identification of the sequence of characters that have been counted for the nonfiling indicator value, thus defining the nonfiling zone that would be marked by the new characters need to be made clearer than they have been in the past. The nonfiling zone is always at the beginning of a subfield, generally the subfield $a.

2.3.1 Definite and indefinite articles and other alphanumeric characters

Proposed Rule 1: The nonfiling zone includes all characters preceding the first filing character in title, name, and subject headings, with the exceptions noted in Rules 2-4.

     {A }place like Alice
     {al-}Sadat, Anwar
     {Der }Spiegel
     {La }philosophie et le Québec
     {16,16-}Dimethylprostaglandin E2 
     {N,N-}Dimethyltryptamine
     Dissertation abstracts. A, {The }humanities and social sciences.
2.3.2 Diacritics

Currently in MARC 21, diacritics at the beginning of a field that does not have any nonfiling characters and diacritics associated with the first filing character in fields with a definite or indefinite article are not counted as nonfiling characters, e.g.,

     222 #0  $aÖsterreich in Geschichte und Literatur
     222 #4  $aDer Öffentliche Dienst

This practice is not thought to be consistently followed, however. Failure to follow this rule will cause difficulty in the future with Unicode since in Unicode the diacritics are encoded after the characters they modify rather than before, thus they would clearly not be included in the nonfiling count. In cases where the diacritic has been counted, the indicator count will be erroneous for data converted to Unicode.

Proposed Rule 2: The nonfiling zone does not include any diacritics associated with the first filing character, but does include any diacritics associated with the definite or indefinite article or other alphanumeric characters that precede the first filing character.

     Österreich in Geschichte und Literatur
     {Der }Öffentliche Dienst

2.3.3 Special characters, etc.

Currently in MARC 21, special characters, such as "[", "..." and ALIF, are included in the nonfiling zone when they occur in conjunction with an article:

     245 05  $a[The part of Pennsylvania that ... townships].
but are not counted when there is no article, with the expectation that systems will ignore them automatically:
     245 00  $a[Diary].

There are no Unicode implications in this case.

Proposed Rule 3A: The nonfiling zone includes any special characters, etc., preceding the first filing character.

     {... }and then I said
     {¿}Quién es quién en el Perú?
     {16,16-}Dimethylprostaglandin E2 
     {N,N-}Dimethyltryptamine
     {"}Hsuan lai hsi kan" hsi liah
     {[}Diary]
     {­}as others see us
     {The ... }annual report of the Governor 
     {L'}enfant criminal

Possible Rule 3B: The nonfiling zone includes any special characters, etc. preceding the first filing character that occur in conjunction with a definite or indefinite article or other alphanumeric characters preceding the filing character.

     ... and then I said
     ¿Quién es quién en el Perú?
     {16,16-}Dimethylprostaglandin E2 
     {N,N-}Dimethyltryptamine
     "Hsuan lai hsi kan" hsi liah
     [Diary]
     ­as others see us
     {The ... }annual report of the Governor 
     {L'}enfant criminal

2.3.4 Spaces

Currently in MARC 21, spaces before the first filing character are included in the nonfiling count if other counted characters are present.

Proposed Rule 4A: Spaces that occur before the first filing character in conjunction with an article or special character or other alphanumeric characters are included in the nonfiling zone.

     {... }and then I said
     {The ... }annual report of the Governor 

Possible Rule 4B: Spaces that occur before the first filing character in conjunction with an article or other alphanumeric characters are included in the nonfiling zone.

     ... and then I said
     {The ... }annual report of the Governor 

2.3.5 Proposed rules

Rules 1, 2, 3B and 4B are closest to the current practice in MARC records. However a simpler set of rules would be Rules 1, 2, 3A, and 4A, as they can be summarized as simply: "A nonfiling zone includes all characters preceding the first filing character in a specified subfield, excluding any diacritics associated with the first filing character." The Library of Congress Cataloging Policy and Support Office recommends this rule, noting that it also might be the most consistently applied.

2.4 Use of the Control Characters

This proposal presents an alternative method for indicating nonfiling characters to the method currently used, i.e., it is a direct substitute for the nonfiling indicator technique that has been used in MARC records from the first format draft. In addition to accommodating the nonfiling situations covered by the current technique, it also accommodates 4 additional situations: fields where the nonfiling indicator could not be defined because the indicators have been used for different purposes (e.g., field 240, field 650), subfield $t (Title) in author/title fields, subfield $p (name of part/section of work) in fields with part titles, and field 490 (Series Statement) subfield $a (Series statement). There are other situations where an agency may not want to include other words internal to a string in indexes or in sorting the string, but this technique is not proposed to be used for those needs. There would have to be broad agreement on a list of words and rules for use in those circumstances. The technique could be extensible to other situations in the future, however.

2.5 Impact Considerations

There are a number of points that should be taken into account when evaluating the impact of this change on existing systems, however, it can be assumed that there would not be a need for any manual remarking of retrospective records.

3 PROPOSED CHANGE

In the MARC 21 Bibliographic, Authority, Classification, and Community Information formats:

- Define for use in USMARC records two new control characters from ISO 6630:

- Make obsolete the indicator denoting number of nonfiling characters in the following fields and formats:
- Adopt following rule for the nonfiling character zone definition:
- The technique may be used in the following subfields:


Go to:


Library of Congress
Library of Congress Help Desk (12/7/98)