MARBI Members:
MARBI Members:
James Crooks RUSA Univ of California, Irvine
Josephine Crawford (recorder) ALCTS Univ of Wisconsin, Madison
Elaine Henjum LITA Florida Ctr for Lib
Automation
Diane Hillmann LITA Cornell University
Carol Penka RUSA University of Illinois
Jacqueline Riley (chair) RUSA University of Cincinnati
Frank Sadowski ALCTS University of Rochester
Paul Weiss ALCTS University of New Mexico
Robin Wendler LITA Harvard University
MARBI Interns:
Anne-Marie Erickson RUSA Chicago Library System
Anne Flint ALCTS Ohio Lib and Info Network
Representatives and Liaisons:
Joe Altimus RLG Research Libraries Group
Karen Anspach AVIAC EOS International, Inc.
John Attig OLAC Pennsylvania State Univ
Sherman Clarke VRA New York University
Betsy Cowart WLN WLN, Inc.
Donna Cranmer AVC Siouxland Libraries
Bonnie Dede SAC University of Michigan
Catherine Gerhart CC:DA University of Washington
David Goldberg NAL National Agricultural Libr
Rich Greene OCLC OCLC, Inc.
Rebecca Guenther LC Library of Congress
Michael Johnson MicroLIF Follett Software Company
Maureen Killeen A-G A-G Canada Ltd.
Rhonda Lawrence AALL UCLA
Sally McCallum LC Library of Congress
Karen Little MLA University of Louisville
Susan Moore MAGERT University of Northern Iowa
Elizabeth O’Keefe ARLIS Pierpont Morgan Library
Marti Scheel NLM National Library of
Medicine
Louise Sevold CIS Cuyahoga County Public
Library
Margaret Stewart NLC National Library of Canada
Rutherford Witthus SAA University of Connecticut
Other attendees:
Jim Agenbroad Library of Congress
Joan Aliprand RLG
Diane Baden NELINET
Mathew Beacon Yale University
Candy Bogar DRA
Jo Calk Blackwells
Winnie Chan University of Illinois,
Urbana-Champaign
Tamar N. Clarke National Library of Medicine
Karen Coyle University of California
Larayne Dallas University of Texas at Austin
Carol B. Dundle Trinity University
Joanna Dyla Stanford University
Stuart Ede The British Library
John Espley VTLS
Emily Fayen EOS International
Michael Fox Minnesota Historical Society
G. Fuzon EOS International
Jeff Goodwin EOS International
Anke Gray University of Washington
Greta de Groat Stanford University
Kay Guiles Library of Congress
Stephen Hearn University of Minnesota
Elise Hermann National Library Authority, Denmark
Kimball Hewett Ameritech Library Service
Charles Husbands Harvard University
William Jones New York University
Rhonda Kesselman Princeton University
Kris Kiesling University of Texas at Austin
Curtis Lavery RLG
Ming Lu OCLC
Rita Lunnon Stanford University
Liz McKeen National Library of Canada
Michael Mealling Network Solutions (and member of
IETF)
David Miller Curry College
Paula Moehle University of Georgia
Elizabeth Morgan Library of Congress
Chris Mueller University of New Mexico
Cecilia Preston Preston & Lynch
Pat Riva McGill University
Rebecca Routh Northwestern University
Jacque-Lynne Schulman National Library of Medicine
Ann Sitkin Harvard University
Gary Smith OCLC, Inc.
Gary Strawn Northwestern University
Julie Tammian Best-Seller
Gary Thompson UCLA
Frank Williams Ingram Library Services
Mathew Wise New York University
Larry Woods University of Iowa
Ruth Wuest Endeavor Information Systems
Art Zemon DRA
Joe Zeeman CGI
******************** Saturday, June 28, 1997 ********************
MARBI chairperson, Jacqueline Riley, opened the first MARBI meeting
of the San Francisco conference with a round of introductions.
She reviewed adjustments to the agenda, and then launched directly
into the work of the conference.
PROPOSAL 97-10: "Use of the Universal Coded Character Set in the
MARC records"
Larry Woods, chair of the MARBI Character Set Subcommittee,
introduced 97- 10. It is a another part of an earlier proposal
(96-10) regarding the Universal Character Set (ISO 10646); after
approval of the bulk of the recommendations, it was felt that the
issue of “ASCII clones” deserved more discussion and analysis.
Larry reviewed the original charge and the four working principles:
Principal #1: Provide support for round-trip mapping;
Principal #2: Maintain the same transliteration schemes as far as
possible;
Principal #3: Allow for handling of modified letters similar to
current practice;
Principal #4: Avoid private use space if possible.
In the course of its work, the Subcommittee found that some issues
required placing more importance on one principle than on another.
ASCII clones are characters such as numbers, punctuation marks, and
special symbols that are currently shared by more than one USMARC
character set. These characters are defined in the Latin,
Arabic, Hebrew, and/or Cyrillic character sets and they appear in
USMARC records. Round-trip mapping (from USMARC to the Universal
Character Set (UCS) and then from UCS back to USMARC) is
problematic because of the many-to-one relationship. The
Subcommittee looked carefully at three options, and discarded one
as not feasible. The remaining two are:
Option #1: Map USMARC ASCII clones to a unified repertoire in the
universal set.
Option #2: Precede each ASCII clone by a script flag character
defined in private use space.
Larry reported that, due to the programming costs associated with
utilizing the UCS private use space option, the Subcommittee is
not in favor of Option #2. The fourth working principle of the
Subcommittee supports this recommendation. However, Option #1 has
one important disadvantage: pure round trip mapping may not be
possible.
Joan Aliprand explained exactly what would be lost in round trip
mapping. For instance, a Hebrew comma (i.e., the comma of the
USMARC Hebrew character set) originating in a USMARC record would
translate to the comma character in a UCS record. If the same
record had to be translated back into USMARC, the UCS comma
character would be converted to the Latin (ASCII) comma, which is
the agreed-upon mapping for the UCS comma. After conversion to
UCS, it would be impossible to know that the original USMARC
character had been from the Hebrew character set. Joan noted that
the 8-bit USMARC hex value would not change, but the Escape
sequence associated with it originally would be lost. Joan has
been thinking about RLG's display algorithm for right-to-left
scripts in relation to this issue. Because there is no guarantee
that what comes to RLG will be exactly correct for the current
algorithm, RLG will need to consider changes to it, to position
the ASCII equivalents correctly. Joan emphasized that, in theory,
this seems feasible but it will have to be tested out in a real
implementation.
Jacqueline Riley asked for a motion about ASCII clones. Paul Weiss
moved in favor of Option 1. There were eight votes in favor and no
votes against.
Larry Woods moved on to the other recommendations in 97-10,
involving specific rules and/or techniques for implementing the UCS
with a mind to simplifying a mixed environment lasting some years.
The Subcommittee favored not mixing USMARC 8-bit and UCS character
values in the same USMARC records. UCS records should utilize
16-bit characters throughout the record. The binary zeros in the
first eight bits of a UCS record could be a flag that the record
contains UCS instead of 8-bit characters. Defining $d and $e in
field 066 to show which UCS character set and repertoire (or
subset) will be useful.
The proposal also discussed the UCS method for handling diacritical
characters. John Attig asked for clarification. Why is it
necessary for the order of the values to change? He also asked how
double diacritics would be handled. Joan Aliprand explained. In
USMARC, the diacritic (often a non-spacing character) comes before
the base letter. This approach was based on the printing model,
where the print position is not incremented for non-spacing
characters but only when the following base letter is printed. The
UCS/Unicode model, where the base character comes first, reflects
today's computer processing.
There was also a question about how non-spacing marks are handled
in USMARC in right-to-left scripts (Hebrew and Arabic). Right-to-
left scripts are stored in logical order, first to last, and the
non-spacing mark precedes the base letter, i.e., the same as for
left-to-right scripts.
Others raised the issue of Vietnamese, which has multiple
diacritics. How are these diacritics handled now in USMARC?
Apparently, OCLC and RLG do it one way, and LC does it another way.
Sally McCallum reported that the USMARC principle is:
Left to Right and Top to Bottom for left to right scripts; Right to
Left and Top to Bottom for right to left scripts.
In terms of how UCS handles double diacritics, a member of the
audience suggested that the accent mark closest to the base letter
would come first. Joan Aliprand offered to look this up [will ask
her if she found the answer]
Joe Altimus suggested that the recommendations made in this part of
the proposal are very, very technical and require more
investigation. Joan agreed, and pointed out that additional
issues are raised in DP #100 on authority records. She feels that
people well versed in machine processing should be asked to work on
these issues so that the programming and economic ramifications are
understood before a decision is made. She suggested recruiting for
the Task Group from LITA and AVIAC. Charles Husbands agreed with
Joan that a technical committee should look at these
design/implementation issues. John Attig pointed out that there has
been a longstanding request to define a few additional Greek
characters for the Latin USMARC character set. Diane Hillman
reminded all of the need for the section symbol. Joe Zeeman
suggested that this technical task force be convened and produce
recommendations fairly quickly.
Paul Weiss agreed. He moved that we take the other three
recommendations of Proposal 97-10, and turn them over to a
technical working group with recommendations due in 1998.
Membership should include representation from LITA and AVIAC.
Robin Wendler provided a second to Paul's motion. There were eight
votes in favor and no votes in opposition. Jacquie asked for
volunteers to see her during the conference or to send her an email
message afterwards. Karen Anspach will take the issue to the
Monday AVIAC meeting and try to round up some volunteers.
Larry Woods closed the discussion by reporting that he had been
talking with OCLC and RLG representatives. It may be some time
before databases are translated to UCS, although vendors think
some clients may be translating back and forth before the
utilities switch.
ACTIONS TAKEN:
-- Option 1 passed to deal with the ASCII clone issue.
-- Technical working group will be established to deal with the
other issues.
PROPOSAL 97-14: "Addition of new characters to existing USMARC
sets"”
Sally McCallum introduced 97-14 as a fairly straightforward
proposal. The Character Set Subcommittee, while working on the
UCS mappings, uncovered some discrepancies between the published
USMARC Arabic set and some implementations of that set -- notably
RLG's implementation which has the largest database of Arabic
vernacular records.
RLIN supports three Arabic characters that are not present in the
USMARC published document. They are:
1) Arabic Thousands Separator
2) Right-Pointing Double Angle Quotation Mark
3) Left-Pointing Double Angle Quotation Mark
Adding these characters to the published USMARC documentation
should not cause problems in existing databases. Mappings to UCS
have already been done. Paul Weiss moved that MARBI approve the
proposal as written. Another member provided a second. There
were eight votes in favor and no votes against.
The discussion moved on to some related issues. Larry Woods
reported that there are several other Arabic characters present in
USMARC that are not present in UCS. Therefore, RLG submitted a
proposal to the Unicode Technical Committee (UTC) to add them.
Joan Aliprand reported that the proposal had been accepted by the
UTC for the Unicode Standard, and forwarded to ISO/IEC JTC1/SC2/WG2
as a proposed addition to ISO 10646 (UCS). There are several
review levels in the ISO process.
Joan also discussed the Cyrillic underscore. It is an ISO
character and is also found in the British Library character set.
It was not included in the proposal as it ws another ASCII clone.
Now that the clone issue is settled, it could be added to the
Cyrillic set.
A question was asked about other characters for which there is no
UCS equivalent. Two examples are the Short E (Urdu) and the Short
U (Uighur), which are both vowels. Bonnie Dede thought that more
investigation is needed. It is difficult to find examples since
languages which use Arabic script are usually written without
vowels. Even if vowel marks appear on the item being cataloged,
current cataloging practice is not to transcribe the vowels. If
this is true, this may be a non-issue. Joan thinks that the Short
U may be available as a pre-composed letter in UCS. She will
check into it and report back.
Larry Woods concluded this discussion by saying that the Character
Set Subcommittee will be dissolved after a joint meeting with the
East Asian Character Subcommittee, now forming under the
chairmanship of John Espley.
ACTION: Proposal 97-14 passed with the Cyrillic underscore also
added at position 5F in the extended Cyrillic set.
DISCUSSION PAPER #100: "Recording Additional Characteristics in
USMARC Authority Records"
Sally McCallum introduced the discussion paper by saying that it is
hoped to make the authority format more international, and that
she expected this to be the first of several discussion papers.
Sally reported that Barbara Tillett is very supportive of this
direction, and has been working with an IFLA group on the
international exchange of authority records. Barbara will be
talking about this at the ALCTS/LITA authority group program on
Monday. Sally wrote the discussion paper to look at the
fundamental issues that would have to be solved to support the
exchange of authority records in much larger volume than now. The
paper first gives a few definitions and then it describes two
logical models as follows:
Model A: One record contains the heading and all related and
variant reference tracings. Sally believes this model is more
common in the US.
Model B: Separate records are made for parallel headings in the
different language catalogs with the records linked via the 7xx
fields. This model is more common in Canada.
Language of Catalog:
There was some discussion of the need to support the Language of
the Catalog (e.g. a public library in California might want to let
a user select either the English or Spanish version of its online
catalog) at the Midwinter meeting as part of the CanMARC alignment
proposal. Option 2 was passed allowing 008/08 in the authority
record to contain the five CanMARC codes, and the 038 field to be
used for languages where the CanMARC codes are not appropriate.
However, later discussion at LC suggested that field 040 be used,
so examples in DP#100 reflect that.
Language of Heading:
It is easy to confuse a code representing the language of an
individual heading (field specific) with the language of the
catalog. The first is field specific and the second is record
specific. It is not now possible to code the language of the
heading, because of past analysis by LC staff showing the
difficulties that would arise with ambiguous cases. Often, these
ambiguous cases are headings of mixed languages. An example is:
Siege d'Orleans (Mystery Play)
DP#100 suggests a *repeating* $7 to get around this problem.
John Attig asked what people would do with this coding. Paul Weiss
responded that public libraries could ignore headings in languages
that were of no interest in a local OPAC. Rich Greene said that
it was useful to discuss name/title headings where mixed language
headings are very frequent, he believes. John Espley reported that
VTLS has several European libraries desiring language of heading
code; he named the Swiss National Library and the European
Parliamentary Library. Robin Wendler reported that the Harvard
library at the Villa I Tatti in Florence would like this as well.
In answer to Sally's question as to what is done now, Robin
reported that parallel records are used (Model B) but the library
is prepared to change to Model A. Donna Cranmer pointed out that
many public libraries, particularly small ones, have immigrants or
minorities for whom a non-English catalog would be useful.
Someone pointed out that an indexing program could use the language
of heading code to recognize which words should be treated as
stopwords in one language but not in another. However, how would
such a system handle headings in more than one language? Paul
suggested that the codes could show when the heading is mixed,
versus the straightforward single language heading code. Josephine
Crawford suggested that the indexing program could follow a default
rule in the mixed cases. Might lead the way for improvement over
what happens now to mixed headings in many systems (e.g. looking
for the French journal "The" is difficult when "the" is treated as
a stopword). May need some research to prove that this is a common
problem and that a solution is feasible. Joan Aliprand is inclined
to agree with the LC analysis of the past. Even though the heading
might contain a mixture of languages, it fits within the catalog
where it is used; each the catalog has its own particular language
which is the lanugage of the community that uses the catalog.
Someone in the audience distinguished between USMARC as a
communications format, versus the applications that use USMARC as
a record structure. It is easier or harder to solve
application-specific issues depending upon USMARC. May not always
be appropriate for USMARC to try to solve each application issue.
We all display records with multiple languages. Does it really
make sense to worry about it? Diane Hillmann disagreed. She
stated that there are libraries wanting to design catalogs with the
needs of specific user groups in mind. If codes representing the
language of the heading help to do this, this is a good thing.
Rhonda Lawrence said that the purpose of this paper is to improve
the international exchange of cataloging records. Therefore, is
this coding useful to a cataloger? Yes, she believes this to be
the case because cognates are confusing. She gave an example of a
legal jurisdiction in another country. She does, however, think
that it will be difficult to create rules for coding the language
of the heading.
Attig brought the group back to thinking of these codes appearing
not just at the field level, but also at the subfield level, due to
the presence of mixed headings. Cataloging rules are not now
based upon the language of the user. Paul Weiss said that he will
use Spanish headings in his catalog no matter what the cataloging
rules say, if a user prefers a Spanish language catalog. Joan
Aliprand sympathized with Paul's dilemma. But, she thought there
could be some problems displaying Spanish cross references if you
do not know under what rules the references were constructed. She
asked if there was a Spanish equivalent to AACR2. Paul has a real
use, and has to do something within limited resources. Joe Altimus
suggested taking a look at the UniMARC Authorities format. Sally
reported that it has the language of catalog at the field level.
There are some sho do not understand the difference, however, and
interpret the subfield as language of the heading. The language of
catalog subfield might be changed to language of heading definition
in the future.
Robin spoke in favor of the idea of a universal record where the
system or application plucked out what was needed for the given
situation. Sally agreed, calling this a classic mudball record, and
stating that this is what people seem to want for exchange
purposes, although, there will be a large number of codes at the
field and record level. Can the computer processing handle all
this? Rich reported that OCLC staff are very interested in
international authority control because they can't really handle
the needs of the French Canadians now. Sally summed up by
describing a resource authority record with appropriate computer
processing at the application level.
Stuart Ede from the British Library reported on the European
"AUTHOR" project that had looked at the possibility of a merged
authority file but decided it was impossible. They have instead
created the ability to download so that catalogers can exchange
records and build on them for their own needs. Paul said that it
might be interesting to know how that is working. Diane agreed
stating that the current situation is a horror because there is no
good control. She would like to see a catalog able to establish
some machine rules for processing a resource authority record, and
then presenting just the right view to users. John Attig spoke in
favor of coding both the language of heading and the cataloging
rules used to construct the heading at the field level. Rhonda
suggested looking at the Getty approach for handling various forms
of artist names in many languages; the Getty file is considered to
be a resource file as opposed to an authority file. She suggested
thinking of the problem in a new way entirely. Others in the
audience thought that this was a different concept, so Rhonda
explained that the cataloger does not select the right”heading but
instead lists the possibilities.
Using the mudball record approach, Paul wondered how bib records
would be exchanged. If there is a mudball resource record, it
would be easier to swap in an English heading in place of another
language in the 1xx in the bib record. The bib 1xx should store
the concept”rather than a heading string. Paul said that NLM tried
to push the advantages of this approach several years ago.
Script:
Sally gave some background on the use of the Escape (ESC) sequence
in the 880 field. It is not user-friendly for those who use the
records. In addition, if only one script for an individual 880 field can
be indicated, it is not user-friendly when the field contains multiple
non-Roman scripts. The script in the 1xx field should change depending
upon the language of the country. That is, the U.S. puts the romanized
version of the heading in the 1xx and the non-roman version in the 880.
This is currently reversed in those countries where the language is the
script of our 880. Sally asked if recording the script
consistently at the field level in the authority record would be
helpful, presumably as a replacement for current practice. John
asked if Sally is suggesting to stop using the 880 in the
bibliographic record? Sally replied that she is opening up the
discussion in terms of both the authority and bibliographic
records.
Sally asked, what do catalogers and systems now do? Joan reported
that the RLIN system supplies the 066 and $6 automatically based
upon the scripts in the record. A cataloger has to type in the
value "$6" only for an unpaired vernacular field. Sally asked what
does the receiving system do? Joan said that the 066 tells the
receiving system the scripts present in the record, for a computer
processing decision as to whether or not the non-Roman data can be
displayed. She also thinks that the RLIN system might use the $6
for indexing. The ESC sequence in the $6 (used to identify the
first script in the field) may be a historical leftover from the
early days of RLIN CJK. Gary Smith reported that catalogers do not
see the script coding on OCLC; others said that the same is true
with the VTLS, EOS, and RLIN systems. Maureen Killeen reported
that the Canadian CATSS system doesn't use the 880.
Joe Altimus said that more time was needed to analyze and give
feedback on this issue. Rich Greene agreed. Joan suggested that
the Technical Working Group be asked to deal with this issue.
Sally would like analysis from both a systems and a cataloger
point of view. She requested that discussion continue on the
USMARC list.
Transliteration:
Paul found the Moscow example on page 12 confusing. Is it the
cataloger or the author/publisher transliteration? Joan thinks it
is useful to record the transliteration scheme at the field level,
when it is possible. But, often the cataloger doesn't know which
text the transliteration scheme applies to. She also gave an
example of a mixed Hebrew/Yiddish heading. Rich said that the
issue raised concerns about how to synchronize bibliographic and
authority records, as they are not now created at the same time in
many cases. At least, Wade-Giles transliteration is now
consistently used.
Country or Nationality:
Paul reported that the University of New Mexico Fine Arts Library
would find it useful to identify the nationality of an artist. He
gave an example of a name heading of a Latin American composer;
handled manually now by adding a form heading in the 655 field,
e.g. Brazilian composer. Robin agreed that nationality coding
could be useful, but it should be optional. Sherman Clarke
reported that the Visual Resources Association has a core set of
fields which includes a nationality field in the bibliographic
fields; not yet in the authority set. Others in the audience
affirmed that nationality coding would be useful. Paul also said
that gender coding could be helpful.
Concluding Remarks:
Does it seem reasonable to change the 880 approach? If a local
system does not use, but retains for import/export reasons, what
would be the impact of a change? Sherman Clarke wondered if this
coding could move from authority to bibliographic records and vice
versa based upon cataloger macros. Sally asked who loads the
vernacular now? Robin said that Harvard plans to do this, and
John Espley reported that some VTLS libraries do it now.
What should happen next? Sally asked for responses, particularly
from OCLC and RLG, on the USMARC list. Sally would then like to
create a second discussion paper for the Midwinter meeting. Robin
suggested that the paper discuss the different purposes for each
coding change. Diane Hillmann urged that subject examples be
added (she was then asked to find some).
PROPOSAL NO. 97-12: "Definition of separate subfields in field 536
(Funding Information Note) for program element, project, task, and
work unit numbers"
Rebecca Guenther introduced this proposal by saying that the joint
cataloging committee from the Departments of Commerce, Energy,
NASA, Defense, and Interior (called CENDI) has mapped the database
structure now in use to the USMARC bibliographic format. In most
cases, the mapping has been one on one. This proposal covers a
group of important CENDI data elements, related to funding and
contract monitoring, for which no USMARC equivalent exists. At
the current time, these data elements are mapped to the 536 field,
but coding specificity is lost. The proposal suggests the
following:
-- Add 536 $e (Program Element)
-- Add 536 $f (Project Number)
-- Add 536 $g (Task Number)
-- Add 536 $h (Work Unit Numbers)
-- Change the name of 536 $d to "Undifferentiated Number" or make
536 Subfield $d obsolete.
Paul and others wondered why CENDI needs this specificity. Rebecca
said that it has important meaning in this user community. Are
the numbers hierarchical? Depends. If there is a hierarchy
involved, Diane likes the idea of this represented in the format.
Perhaps it could then be generalized to another situation.
Rebecca said that there is first a project, then a task, etc. She
thinks that practice is more like a timeline than a hierarchy
because not all elements are always used. John Attig was nervous
about second guessing this user community. But, Robin said, it is
good to know as much as we can, so that the format coding is not
just meaningful to people in a particular area of expertise. Paul
was uncomfortable passing the proposal until more is known about
the use of these numbers.
Rebecca and Sally said that the field was first established back in
1979. Rich Greene said that there are about 25,000 records now in
OCLC with the 536 field. To grandfather in these records, OCLC
prefers renaming the 536 $d rather than making it obsolete. In
this way, it is simply a matter of changing the documentation, not
the system. Robin Wendler made a motion to pass the proposal,
adding the new subfields and changing the name of 536 $d. Frank
Sydowski provided a second. There were six votes in favor, one
vote against, and one abstention.
ACTION: Proposal passed. Field 536 $d will be renamed to
"Undifferentiated Number" and subfields $e, $f, $g, and $h will be
added.
PROPOSAL NO. 97-13: "Changes to field 355 (Security Classification
Control) for downgrading and declassification date"
The CENDI group also requests changes to field 355 to improve
security control over documents that are classified or
declassified. As currently defined, 355 does not account for
changes in security classification in a sophisticated way. Rebecca
briefly reviewed the proposal:
-- Rename $d to "Downgrading or declassification event"”and limit
it to a description of an event that must take place prior to
downgrading or declassification.
-- Define a new $g for "Downgrading Date"”
-- Define a new $h for "Declassification Date"”
-- Define a new $j for "Authorization" to show by whose authority
a document can be or has been downgraded.
The dates would be in yyyymmdd form. Subfields $g, $h, and $j
would not be repeatable. Paul asked why this is the case; can
something be downgraded more than once? Yes, said Rebecca, but
then the field is repeated. Paul asked if the $j is only input
when there are changes in how a document is classified. Rebecca
thought that it was generally not needed when the field is first
input. Someone else asked if a document ever gets upgraded, in
which case perhaps other subfields are needed to track these
events. Rebecca said that this is unknown at this time.
Paul Weiss moved that the proposal be passed as written. Robin
seconded the motion. There were eight votes in favor and no votes
against.
ACTION: Proposal 97-13 passed.
BUSINESS MEETING:
Sally reported on documentation efforts by her staff:
-- Bib and Authority format updates were drafted last March and are
expected from the printer in August or September. Will include
the changes passed as part of the work to harmonize USMARC and
CanMARC. [After the meeting the Bib was pulled back and changes
from the June meeting added.]
-- New edition of the Relator code list expected in July.
-- Not able to do the Holdings update last spring, so are targeting
September or may hold it until next year.
-- The Concise is expected to be sent this fall for printing. Will
be available online first, however.
-- LC staff are in discussion with the National Library of Canada
about the documentation issues relating to the harmonization. The
next edition
may be a single edition, with LC doing the English edition and NLC
responsible for the French edition.
-- Sally noted that some mappings are beginning to appear on the
USMARC Web page.
-- Classification records are now available as a MARC distribution
service from CDS.
Book vendor information from Harrasowitz and Cassalini is now being
loaded at LC in USMARC format. OCLC and RLG are also doing more
of this.
South Africa has decided to adopt USMARC, switching from UNIMARC.
Brazil has been a multi-format country in the past. Now that
Brazilian libraries are joining OCLC, there is a movement towards
USMARC.
John Espley has been appointed the chair of the new EACC Task
Force. The charge is to establish a mapping between the East Asian
Character Sets and the Universal Character Set (UCS). Randall
Barry, Beatrice Ohta, Bonnie Dede, and Candy Bogar have volunteered
to serve as well. The Committee will follow the same principles
as the Woods committee. John Attig asked if the Committee has a
timeline? Jacquie said that the Committee can determine this.
Jacquie reviewed a couple of conference programs of interest to the
MARBI group. The joint meeting with CC:DA to discuss metadata
will take place on Monday morning. Jacquie asked if everyone had
received the list of questions. She will bring extra copies to the
meeting.
Jacquie reported that MARBI has been asked to co-sponsor (in name
only) a program with the Committee on Cataloging Asian materials.
Claire Dunkle was present to answer questions. The program will
discuss how to handle the vernacular in authority records, fitting
in with the ALA theme of international connections. Claire said
that they are seeking speakers to present the topic clearly to
generalists. It was requested that the meeting not be held when
MARBI is meeting, so that members could attend. Claire promised to
try her very best to satisfy this request, but she could not
guarantee. Paul made a motion that MARBI co-sponsor this program,
no matter when held. Elaine Henjum provided the second. There were
eight votes in favor of the motion and no votes against it.
Jacquie has been working with the three divisions so that MARBI
membership and intern appointments are regularized and selecting
the chair is handled more easily. There has been one new
appointment. Christine Mueller will be a LITA intern in 97/98.
John Attig asked how the UKMARC harmonization is going.
Unfortunately, Stuart Ede had left at this point. Sally responded
that the British Library is expanding UKMARC each year, and
thereby moving to harmonization gradually. She thinks about 30
fields were added last year. One issue involves the core fields
(like 245 and 300) which have many more subfields because subfield
codes are used to replace punctuation rather than the USMARC
practice of using them to delineate access points.
Marti Scheel reported that the National Library of Medicine is
going to implement the 6xx $v when the next MeSH update comes out
in the fall. More information will appear on the USMARC list.
********************* Sunday, June 29, 1997 **********************
The meeting opened with some news about the joint meeting planned
for Monday, June 30. John Attig reported that the ALCTS Task
Force on Metadata recommends that MARBI and CC:DA work together on
the issues involving metadata, cataloging, and the USMARC format.
Jacquie asked for people to think about this recommendation before
the meeting on Monday. She also passed out the CC:DA questions and
a “crosswalk”document.
PROPOSAL NO. 97-9: "Renaming of Subfield 865 $u to accommodate
URNs"
Many people attending this meeting were able to go to the morning's
program "URIs, Metadata, and the Dublin Core" at the Sheraton
Palace, providing a useful foundation for this proposal. Another
source of information is the Preston/Lynch/Daniel paper on using
existing bibliographic identifiers (like ISBN and ISSN) in the URN
framework and syntax (available by FTP at
ds.internic.net/internet-drafts/draft-ietf-urn- biblio-00.txt).
In introducing this proposal, Rebecca Guenther said that there is
an immediate need at LC to code a "handle" in an 856 field that it
can then be resolved by a handle server to a URL. Some
participants of the National Digital Library Project would also
like to record handles. Referring to the program at the Sheraton
Palace, Rebecca said that some of what Cliff Lynch discussed makes
her think now that the recommendations in this proposal are on the
right track. It is necessary to accept that different naming
authorities will name objects in different ways; USMARC format
planning must proceed from this point of view. One question to
discuss is how to record a handle. The proposal recommends
changing the name of the 856 $u from "Uniform Resource Locator" to
"Uniform Resource Identifier" so that either a URL or a URN
(including a handle) can be recorded. The $u can be repeated in
the case where multiple identifiers are desired. Given the rapidly
changing situation, where experimentation is useful and important,
multiple identifiers are expected in some records. Alternately, a
new subfield can be considered for URIs, since some concerns were
raised on the list with mixing address types in the $u.
John Attig asked if the functionality of URLs and URNs is the same,
even though the operational concept is clearly different, one
being a server address and the other being a logical address.
John asked if the rest of the 856 field would be handled in the
same way, if a URN is input; what other data elements would be
input at the same time? Joe Altimus replied that he considers the
URN and URL to be similar in function. Although RLG staff has no
consensus as of yet on the issue, it does seem an annoyance to have
different address types in the same subfield.
Robin Wendler considers the URN to be like a call number. It is
not an exact shelf location, but a surrogate for the location of
the item. Even though the 856 seems appropriate at this time, she
introduced the thought that a call number field might be better in
the future. Rebecca responded that there is no guarantee of a
one-to-one correspondence between a bibliographic record and a
URN; therefore, the 856 seems better. Rich Greene said it was OK
with OCLC staff to place the URN in the 856 field, but there are
strong feelings not to use the $u. The current software assumes
a URL, and a user can click and be directed to the electronic
resource. This is not yet possible with URNs given OCLC's current
software. Michael Mealing stated that Web browsers generally deal
with unknown protocols, and asked if the OCLC system can do this.
Rich thought not. Michael said that URNs and URLs act the same
way programmatically, so it is possible to put both in the same
subfield since future browsers will handle the two. If a decision
is made today to separate the two into two subfields, than it means
future computer processing will have to look in both places. He
recommended against this course.
Diane Hillmann registered an objection to the $u name change on
conceptual grounds. The 856 is a holdings tag but is kept in the
bib format for convenience, since most systems don't support the
holdings record yet. There is both a verification role as well
as a retrieval role. Rebecca returned to Robin's call number
analogy, reminding everyone that the 852 field contains the call
number in the holdings format. But, it is necessary to keep in
mind that different naming authorities will decide upon the rules
(i.e. syntax, definitions, resolution methods, etc.) and not the
library community. Therefore, it won't be possible to fit this
into our current call number structure in a reliable way.
Karen Coyle said that she hadn't yet worked with handles and wanted
to know more about them. She is familiar with PURLs which look
like URLs. Josephine Crawford wondered if the difference between
a handle and a PURL has to do with the syntax and the resolution
server software. Cecilia Preston said that there could be 7-8
URNs describing the same object. The situation is very fluid
right now, and it is not possible to find a clean and easy
solution. She suggested thinking of these different URN types
similar to having both an ISBN and a call number for the same
book. Both refer to the same entity, but have different syntax
and different purposes. John Attig wondered if this then means
that we record the ISBN twice, once in the 022 and once in the
856. Cecilia suggested taking a look at her paper.
Mike Mealing discussed the URN structural goal: to be
context-free. Unless you have the authority to understand the
string, don't derive a URN from an ISBN. It should be an opaque
string that acts by itself. Karen Coyle wondered if different
syntax in the $u (i.e. URL: versus URN:) would be enough to
distinguish the two. Rich said not with the current OCLC
software. Diane pointed out that the 856 field is now repeatable.
The current approach works well for URLs. Why monkey around by
using the same subfield for something different? Don't we have
years of experience teaching us to code separately for different
data elements? Mike said that we are simply naming and then
resolving; he believes that there will be a standard resolving
procedure to program in our systems. Cecilia said that the
situation has to shake out, and the 856 $u is ok until the dust
settles. Diane replied that, when we put something in one place,
it stays there historically. Rebecca said that a new naming
authority can create new numbers at will; do we want to create a
new holdings record then? People said no! Karen Anspach also
recommended a separate subfield code for the URN. She felt that
wise OPAC management required caution; need to avoid presenting a
usable URL versus a non-usable URN to catalog users at this time.
Rebecca said that the indicator could be used to control this.
Diane and others asked what is the disadvantage of defining a
separate subfield code? Rebecca said that there aren't many
subfields left, but perhaps $g could be redefined. Jacquie
wondered when the software would be developed to process both the
URNs and URLs with a single click by the user. Gary Smith said
that it is no harder to check for two subfields than to work
through differences in the same subfield. If the URN is coded in a
separate subfield, will it be repeatable? Yes.
Frank Sadowski moved to pass the proposal as written. There was a
second by Carol Penka. There were four votes in favor, and five
against. The motion did not pass.
Robin Wendler moved to pass the proposal with a different subfield
code. LC staff will need to determine the best code, after an
investigation into what is now used at LC. Paul provided a
second. The motion included defining the # indicator code for the
856 field meaning "No information provided". (John Attig asked
for some direction in the documentation on how to code the #
indicator if there are both URL and URN recorded.) There were nine
votes in favor of Robin's motion, and no votes against.
ACTION: Proposal 97-9 passed. A new 856 subfield will be defined
to hold the URN.
PROPOSAL NO. 97-8: "Redefinition of subfield $q (File transfer
mode) in field 856 of the USMARC formats"
Rebecca introduced the proposal by saying that, if the subfield $q
is redefined, it would take care of a current problem with mapping
from the Dublin Core, the GILS format, and others to the USMARC
format. These other formats carry a position for electronic
format type (also known as MIME type or Internet Media Type). The
Internet is designed to support the integration of multimedia
resources, often by triggering software based upon the file
format. Often a file extension is used to show a video or audio
file, but it is still useful to include the file format in the
description of the resource. The USMARC format does not have a
clearcut place for the file format. Currently both the 516 (Type
of File) and 538 (System Detail Notes) fields are used, and the
538 carries other information as well. Rebecca has surveyed other
systems and determined that the 856 $q has been used only to show
binary/ascii format. It seems reasonable to expand the 856 $q to
satisfy this need. Rebecca advised against creating a new 856
subfield, as so few are left.
Paul Weiss stated that it is most useful if standard codes are
used. One possibility is the IANA (Internet Assigned Numbers
Authority), which is a central registry for specific values.
Rebecca agreed, but said we can only encourage the use of standard
codes at this time, as the Dublin Core does not require codes.
Therefore, free-text format data will also be mapped to the $q.
Paul moved to accept the proposal as written, taking into account
the error on page 5 which states that the $q should be repeatable.
As explained in the body of the proposal, the $q should not be
repeatable. If there is more than one format, there should be
different 856 fields. Diane Hillmann made the second. There were
eight votes in favor, and no votes against.
ACTION: Proposal 97-8 passed as written, with correction to make
$q not repeatable.
PROPOSAL NO. 97-3R: "Redefinition of code "m" (Computer file) in
Leader/06 in the USMARC Bibliographic Format"
Rebecca introduced the proposal to the audience by explaining that
this is the fourth time that this issue has come before MARBI. It
is a complex issue, and everyone's thinking has grown with each
discussion. Rebecca looked carefully at the concerns expressed by
OCLC, and OCLC provided many examples which were very helpful.
Therefore, the revised proposal has a more explicit definition for
Leader/06 "m" code and instructions have been added. “In case of
doubt or if the most significant aspect cannot be determined, the
cataloger should consider the item a computer file.” Rebecca also
explored using the 008 field for identifying computer files where
the significant aspect requires something other than Leader/06 "m"
coding. Her thought was to use 008/23 (Form of item) in
Books/Serials/Music/Mixed Materials, and to create a new 008
position in Maps and Visual Materials. She then saw how
scattered”this information would be, adding programming overhead.
So, she settled on using the 007 field across the board. This also
means that a bibliographic record describing a work can have an
856 field (and matching 007) for the electronic version, and other
holdings fields describing the paper version.
John Attig wondered about the phrase "alphanumeric" in the "m"
definition. He gave the example in the book world where a table is
created frequently containing alphanumeric data. Paul Weiss
agreed, stating that numeric data is the same whether paper or
electronic. Betsy Mangan reported cartographic data is text and
therefore treated like language material. Robin Wendler brought
up the example of statistical data files--the way they are used
changes the nature of the data. She would like the primary aspect
to be the numeric data, but could accept falling back on secondary
aspect coding. Rebecca said that she didn't want to remove
"numeric data" from the definition because it was part of the
original definition. Margaret Stewart asked if computer-oriented
multimedia would be the same or similar to interactive
mulit-media. Yes, several answered.
Paul suggested trying to keep the definition simpler to make life
easier for catalogers. He said that if the definition requires
the cataloger to make a decision about the significant aspect of
the item, there will be no consistency as different catalogers
will make different decisions on the same item. Betsy agreed that
this problem will exist, but said that there is no way to avoid
it. Knowing the signficant aspect does have its uses. Diane
Hillmann agreed that consistency is not possible. Some libraries
will establish conventions and will prefer one over the other. She
did not see this issue as a major impediment.
Robin discussed Harvard's current methods for automatic duplicate
detection. She wondered how to detect programmatically that the
video Hamlet does not equal the paper Hamlet. John Attig said that
AACR2 is not explicit enough. Paul disagreed, stating that the
problem is more the format. Robin spoke up for cataloging
guidelines, to help catalogers decide on the significant aspect.
Frank Sydowski said that he thought all this was clear when he
first read the proposal. But, given this discussion, he wondered
how the following situations would be handled. Let's say that the
cataloger has three diskettes:
1) An Annual report in a Wordperfect file (consider this to be
text)
2) A statistical file in Excel (consider this to be a computer
file)
3) A JPEG graphic file (consider this to be a graphic) Only the
second one would be coded Leader/06 "m" probably.
Jacquie Riley asked if there was any more discussion about the
definition. She asked for a straw poll, including everyone in the
room, not just voting members. She asked how many people think that
the phrase "alphanumeric data" should remain in the definition?
Only 7-8 people voted in favor of this. There was discussion
saying that this is confusing and that the real intent is to limit
to statistical data, meaning numerical data that may have some
alphas included but is non-narrative. Jacquie asked for a straw
vote in favor of changing "alphanumeric data" to "numeric data".
There were 18 people in favor, and 15 people against. If numeric
were excluded from the definition, how would catalogers handle it,
and handle it consistently? A decision about significant aspect
still has to be made. Sally suggested that numeric would fall into
the "etc" part of the definition. Paul spoke up against this,
saying that he wants to ensure consistency and thinks that "etc"
should be removed. Jacquie asked for a straw vote to remove "etc"
from the definition. There were about 25 in favor, and only 3
against this proposal.
The discussion moved on to the 007 field. Joe Altimus reported
that the examples in 97-3R are very helpful, but he would also
have appreciated a multimedia example. John Attig asked why
prefer the 007 over the 006? Robin said that, even though there
are drawbacks to the 007, she likes it better than the 006 because
she can legally carry information down to the holdings record when
dealing with a multiple-versions record. John responded that the
006 has a clear relationship to the Leader. Rebecca said that the
007 seems appropriate since we are talking about the physical
characteristic of an item. If a cataloger is describing the
original, and the agency doesn't create holdings records, the
cataloger can put the 856 in the bib record; but then, the 007
cannot be mandatory. Paul said that if requiring the secondary
aspect to be coded is a good idea, than he prefers the 006 over
the 007.
Rich Greene said that OCLC staff are struggling with the existing
ambiguity, and would like the physical carrier to be very
explicit. Rebecca suggested reviewing the 008 idea again. Rich
said that OCLC staff liked the 008 idea, even though there was a
scattering between two character positions. Marti Scheel reported
that NLM staff are concerned about the complexity of the 008. Rich
explained why he thinks the 008 is a cleaner approach, using the
analogy of microforms.
Paul wondered about two 007 fields in the same record, one for a
remote file and one for a CD-ROM. How would your system know
which 007 goes with which 856? Not a problem if the fields are
contained in a holdings record, but this is a problem if contained
in a bib record. Rich said that there is a need to maintain four
aspects: content, carrier, type of control, and publisher control.
The problem is stuffing four aspects into three data elements.
Paul said that it comes down to what should be mandatory in the
record. Diane came back to the problem about the holdings format
not yet implemented in many systems. OPAC users will want to limit
searches, and pairing up an 007 with an 856 is important when more
than one variation of a work is recorded in a single bib record.
Limiting by location might be possible, but this is not as clean
as having a holdings record. Jean Hirons reported that CONSER has
to send records to ISSN Center now. How should she show computer
format? Now a mixed bag. Jean believes that the 007 would work
the best, supporting the need for the holdings format. The current
CONSER practice is to not use an 007 when adding an e-journal 856
field to a print serial bib record.
John Attig said that there is so much record exchange at this time,
that he considers it very important to put mandatory coding in the
format. Paul disagreed, saying why should everyone have to assume
this workload. Robin predicted that the next-generation system
will do more with the 007 and the holdings format. She therefore
felt it appropriate for the USMARC format to use the 007 in this
way. There are no cost/benefit studies that deal with the cost of
coding and the later benefit that can be derived from the
processing of that coding. A member in the audience supported
always knowing that something is in the electronic format, as the
secondary aspect. He went on to say that it is unfortunate that
there is no nice, neat way to identify this in USMARC, but that the
combination of Leader/06, 007, and 856 is workable. Paul said
that the computer would only need to look at the 007 and 856,
since the Leader/06 would describe the primary aspect.
Jacquie asked if there is more discussion. Paul asked about
accompanying materials. If an 007 for computer files is to be
mandatory in all cases, what about accompanying materials? The
problem comes back to not having a holdings record in many
systems. Paul suggested making this be the one acceptable
exception. John Attig asked to consider the 008 alternative again.
Jacquie asked for a straw poll in favor of the 007 over the 008.
There were 22 votes in favor, and 8 votes against.
Diane Hillmann moved to accept the proposal as follows:
-- In the definition of "m" code, change "alphanumeric data" to
"numeric data".
-- In the definition of "m" code, remove "etc." from the end of the
first sentence.
-- Make 007 field mandatory for electronic resources except for
accompanying materials.
Jacquie provided a second to the motion. There were seven votes in
favor, and one vote against.
Paul asked a question about Attachment A. If the Leader/06 is not
"m" then some of the 008/26 values could be made obsolete because
they're redundant. He thought that this would be a good idea for
something like "k" (representational). Rebecca promised to think
about this for the next meeting.
ACTION: Propoal 97-3 passed with three changes as stated above.
BRITISH LIBRARY UPDATE:
At yesterday's Business Meeting, a question was asked about
progress with the UKMARC/USMARC harmonization project. Stuart Ede
from the British Library was not present during the Business
Meeting, so Sally asked him now to say a few words. Stuart said
that the project is being phased over several years. The BL put
out a consulting paper earlier in the year describing what is
unique in USMARC and not present in UKMARC. Comments were received
in April showing a general acceptance, and to add these unique data
elements to UKMARC, since this would be helpful. One comment had
a suggestion that might be considered for USMARC. So this might
signal a trend of general yet critical acceptance of USMARC.
The two major areas of difference between UKMARC and USMARC are
ISBD punctuation and multi-volume works. Currently comments are
being gathered on these outstanding areas of concern.
Stuart closed by saying that UK librarians see great benefit in the
harmonization project. He said that the general approach is not to
diverge from USMARC from this point forward, and to try to
harmonize to the extent possible.
PROPOSAL NO. 97-11: "Definition of Subfields in Field 043
(Geographic Area Code) and 044 (Country of Publishing/Producing
Entity Code) to accommodate indication of Subentities in the
USMARC Bibliographic, Community Information (043 only) and
Authority (043 only) Formats"
Sally McCallum referred back to DP#98, discussed at the Washington,
D.C. meeting, which explored the issue of subentity geographic
codes. The conclusion at that time was to add a $b and $c to the
044 field, and to add a $b and $2 to the 043 field. Paul wondered
why the proposal doesn't also include adding a $2 to the 044
field, since it seems the normal practice to show the source of a
code. Would this represent too much added work for LC staff?
Sally said that this would not be a problem, and she is comfortable
with adding a $2.
Field 044 $b, proposed for the Bibliographic format, would contain
a local country's code, whereas 044 $c would contain the code from
ISO 3166-2. David Goldberg explained that the ISO coding scheme
includes both the entity and the subentity. He supplied the
example of BR-MT (Brazil- Madagascar). Sally confirmed that this
was so. Sherman Clarke asked if it was necessary to indicate in
008/15-17 that there is a subentity code in 044. Diane questioned
if this was necessary.
Field 043 $b and $2 are proposed for the Bibliographic, Community
Information, and Authority formats. Subfield $b would contain the
local Geographic Area Code (GAC). Sally reminded the group that
the GAC code can be no longer than seven characters.
Paul Weiss moved to pass the proposal as follows:
-- Define 044 $b calling it simply "Local Code".
-- Define a 044 $2 calling it "Source of Local Code"
-- Define 043 $b as proposed in 97-11.
-- Define 043 $2 as proposed in 97-11.
Elaine Henjum provided a second. There were eight votes in favor,
and no votes against the motion.
ACTION: Proposal passed with some wording changes and an
additional $2 in the 044 field.
DISCUSSION PAPER #101: Notes in the USMARC Holdings Format
Rebecca introduced the paper, by reviewing the questions at the end
of the paper. John Attig opened the discussion by saying that he
feared the subject is very complicated. The 853/863 pairs are
linked to one another, and copy- specific notes should be linked to
the 853/863. He believes that fields mentioned in the paper (533,
540, and 583) are copy-specific; the only reason they appear in
bibliographic records is because holdings records did not exist in
systems when they were defined. He suggested that it is time to
revisit the issue of embedding holdings data in bib records.
Paul Weiss supported the idea of moving these fields to the
holdings record. Robin, working with this type of data alot at
Harvard, gave some wonderful examples. Paul expressed some
concern about cataloger training, if there was this change. Diane
reported that Cornell Rare Books dept prefers for local
copy-specific notes to appear on the bib screen, not on the
holdings screen. Perhaps they should appear in both places, but be
ordered in some logical way on the bib record. Rebecca suggested
that there needs to be a $8 in the 852 as a way to order multiple
holdings notes. None of this resolves the issue that there is not
enough vendor support for the holdings format. Adding these
copy-specific notes to the holdings record will increase the
pressure on vendors. Robin reported that Harvard is talking with
vendors about this issue. She sees the lack of indexing of the
holdings record as one problem with storing these fields solely in
the holdings. Paul suggested that the implementation be done in
stages; make the fields valid in holdings now, but don't make them
obsolete in the bib record until the support for the holdings
record improves.
Mike Johnson supported the idea of item-specific information in the
852 in the bibliographic format. He said that the 5xx is not used
a lot, but the 852 is used! Joe Altimus reported that the RLG
experts on the Z39.50 standard said that the standard doesn't
handle separate holdings records very well. Then, is this really
a good idea? Robin gave an example. You have a multiple version
record with the microform described in an 007 and 533 field.
Harvard is able to export records in two ways:
(1) separate bib record with one holdings embedded;
(2) separate bib record and separate holdings record.
When exporting the record under the current USMARC holdings format,
the 541, 561, and 562 fields are lost. Robin doesn't believe that
a linking subfield $8 would be of help in this situation. She
described how important it is to track the different provenances
for the different copies; right now this is far too messy in the
bibliographic record.
Donna Cranmer emphasized that her library shares the same problem.
John Attig asked if the archivists have considered this overall
issue yet? Apparently not, but it is an important issue for them.
John suggested a proposal at the next meeting that would propose
that fields 541, 561, and 562 be added to the holdings format but
not (yet) made obsolete in the bib format. Another paper should
also discuss the overall issues involved with the bib/holdings
formats. There was a general consensus in the room to do this.
Jacquie also summarized the pros/cons relating to using the same
field in both the bib and holding formats. The 533 and 540 in the
bib format map to the 843 and 845 fields in the holdings format,
whereas the 583 is the same in both format. Given the current
mixed situation, what is best from a cataloger and system point of
view?
ACTION: A proposal will be brought to the 1998 Midwinter meeting
****************** Monday, June 30, 1997 ***********************
DISCUSSION PAPER NO. 102: Non-filing characters”
This paper presents the problems and possible solutions for dealing
with non-filing characters associated with variable field data.
Sally McCallum reported that the problems have been around for
quite a while. The paper was drafted by Randall Barry, with
pros/cons presented for the various techniques he identified.
Unfortunately, the solutions could be very expensive to implement
in systems.
Paul Weiss said that he felt Randall did a good job on the issue.
It would be excellent to come to some closure on this longstanding
problem, even though it will involve quite a bit of work in the
short term. Paul suggested that the future is longer than the
present, and that the pain would be worth the gain. The control
character or subfield solutions appeal to Paul at this point.
Marti wondered if $0 [zero] could be used instead of $1 [one] as
the subfield delimiter that would indicate non-filing characters.
John Attig reported that his system handles articles through
automatic recognition, so catalogers don't have to bother with
this. Robin asked how ambiguous articles were handled (i.e. the
German article "die"” versus the English word)? John said that
system would assume the English language so it would not treat
"die" like the German article. It was pointed out that this system
would not handle the English title of the play "A My Name is
Alice".”Even with these problems, Michael Johnson tended to prefer
the machine approach.
Robin wondered if the subfield solution was only supposed to deal
with non- filing characters at the beginning of a field, or if the
same subfield could be embedded in the field. The discussion
paper assumes that the subfield approach could be used for
embedding, but points out that this would be awkward for computer
processing. David Goldberg said that he feared spacing problems
might occur on display of records.
Karen Coyle preferred the graphical-character-approach solution.
In this way, the non-filing data is stored in the same subfield as
the data itself. However, what character(s) could be used?
Perhaps the right/left arrows? Gary Smith asked if this would be
input, and how the USMARC format would store these characters?
Paul reported that many systems show the EOF character via a
graphical character, and that this is helpful. The input question
should not drive the issue, nor should processing or display
questions. Michael Johnson suggested the bracketed approach; the
linear, on/off approach would ease the computer processing issue.
Karen Coyle agreed that a beginning/ending character is best.
Sally asked if all this applies to initial articles only? No. John
Attig asked if the method could be applied to punctuation as well?
Right now punctuation is usually normalized out, but the rules of
normalization differ from system to system and the catalogers have
no control over this. The cataloging rules should give guidance
on what to ignore, if an improved method was implemented. There
was a consensus that the USMARC format should provide a tool or
method, but that the instructions on when to ignore something
should come from the cataloging rules.
Karen Anspach said EOS has European customers who would very much
like to ignore articles embedded in fields. Marti Scheel asked
about the ANSI or NISO standards that deal with sorting/filing and
with indexing. Does this standard have any relationship to this
issue? Sally has some memory of this standard, but will have to
check on it. Even though filing and indexing are related, these
activities would be handled differently.
Jacquie asked if there is a growing consensus to do something, but
the question is what is the best method? A pair of unused
characters (graphical or control?) to come before and after the
data to be ignored. Graphic characters are on the keyboard and
can be seen. Control characters are not usually visable. Perhaps
a combination to make the trigger really unique? How should
something be selected? Perhaps look at the control/graphical
characters available in UCS, except for the fact that people want
to move this forward more quickly. Karen spoke up in favor of a
combination of characters that would not normally be combined, and
that are in opposite direction of one another as this would help
the user visually recognize the non-filing characters. It was
agreed that a proposal should be moved along and that some serious
research is needed to find the preferred control characters. Big
databases should be scanned to see if certain combinations uncover
any records.
Rich asked if existing records would have to be modified. OCLC
would not want to do this. He suggested that field indicators
should continue to be used, but that the new approach should be
used for *embedded* non-filing characters. Michael Johnson
pointed out that there are two different situations: non-filing
characters that lead a field, and non-filing (or usually non-
indexing) characters that are embedded. Diane Hillmann said that it
was realistic to expect that the present methods (indicators to
ignore an article (or) do not input the article) will exist
side-by-side with the new method of surrounding the non-filing
characters with some unique combination of characters.
Robin wondered about this issue and the UCS approach to diacritics
where the diacritic is expressed after the letter rather than
before. Isn't mapping from USMARC a problem? Won't this undercut
the linear solution being suggested here? Gary Smith said that the
UCS implementors will have to deal with the moving diacritic issue,
and make sure that reverse mapping is maintained. Karen Coyle said
that she believes this issue supports the bracketing solution.
ACTION: A proposal will be developed by LC staff. It would be
very helpful to send Sally many examples.
DISCUSSION PAPER NO. 103: "Current uses of the 028 (Publisher
Number) and the 037 (Source of Acquisition) in the Bibliographic
Format"
Karen Little, representing the Music Library Association,
introduced the paper. She gave an example of the publisher of a
libretto that used the same number for both the stock number and
the publisher number. Where should this number be recorded? In
both 028 and 037 fields? Another situation are video numbers;
they are now recorded in the 028, but might more accurately belong
in the 037, except that indexing is desired. Music librarians
began to look at the overall use of these two fields. John Attig
asked if it is generally true that the 028 is indexed, but that the
037 is not. Karen replied that this is true generally, but there
are important exceptions like VTLS, Melvyl, and RLIN. Melvyl and
RLIN index in separate indexes. There was general agreement that
the distinction between the two fields should be based upon the
inherent nature of the data, and not the indexing capability of the
home system, and that the general difference should be the
bibliographic significance of the number. Paul asked if the first
indicator is used. Yes, for display purposes.
Karen reviewed the two options in the discussion paper and said
that MLA prefers Option 1. Option 1 has two suboptions, and MLA
has no consensus about them as of yet. There are concerns
depending upon the usage of the current indicators. John reported
that OLAC prefers Option 1 also. Paul asked why Option 2 is not
favored. Karen said that there are still some true stock numbers
around that are useful to acquisitions staff but have no
bibliographic significance. John said that, if field 037 were made
obsolete, it would be necessary to move a lot of numbers to the
028.
Frank Williams asked if the question is just about the indexing
problem in the 037. John said no, the problem is larger than
that. He suggested expanding the 028 definition but leaving the
037 field in place. Paul was concerned that John's suggestion
would make life more complicated for catalogers. Diane Hillmann
said that the instructions can tell catalogers what to do when in
doubt. Karen Coyle said that the 028 is a music field, and yet
video numbers are now appearing in it because music publishers are
branching out. Jacquie said that we are not discussing a general
move to put any publisher number in the 028. Robin wasn't so
sure, saying that the publisher numbers are heavily used in the
music world, and indexing is important. Melvyl indexes both 028
$a and $b together. She spoke against combining the 037 and 028
fields since it will add other stuff to the index and make it less
useful to the music community. She and Diane agreed that there
are some limited exceptions where it is useful to put a book number
in the 028 field; the example given is a bibliography of a
composer by a music publisher. Rebecca suggested a name change to
recognize music/score/video, but if in doubt a cataloger should not
use the 028 but should use a 500 note instead. David pointed out
that there is a special field for a certain subject area. Diane
didn't see this as a problem. The consensus was to narrowly extend
field 028 to allow for music and videorecording related material.
ACTION: There will be a proposal at Midwinter.
Minutes prepared by Josephine Crawford
December 1997