October 19-21, 1998
Madrid, Spain
1. Appointment of Scribe - Denise Troll
2. Introductions and Status Reports - to be posted separately
3. Utility Attribute Set (Denenberg)
4. Attribute Architecture - Open Issues (Denenberg)
Discussion:
Stovel: the distinction is there at least partly for historical reasons. We spent some time discussing the history. Denenberg: do we agree that we can technically eliminate the distinction if we have a method or model that describes the two types? Percival: you have a search attribute of a type that corresponds to an element in your schema, but they play the same role. If it is an element in the schema, then there are rules about how you structure the element list. If the element is not in the schema, then it's OK.
Hinnebusch summarizes: there is a general grouping of attributes about which it makes sense to talk about subordination (nesting), and another group of attributes about which it does not make sense. Is there a set of attributes that inherently cannot be nested? Stovel is reluctant to make the distinction based on whether the elements can be nested. She thinks it should be hinged on the schema (e.g., you could have a flat schema and still want to use the Fieldnames). We agree that the protocol does not need the distinction, but that the distinction helps us think about what we're doing.
Denenberg asks Wibberly: is this a necessary distinction within the architecture? Wibberly: no; but we started with flat schema and now we're exposing more of the schema to the client. We have a lack of symetry between retrieval and search. Search fields can be nested if the client has enough knowledge about the schema of the database being searched.
LeVan: the problem is that people want to do structure-based queries and semantic-based queries. How do we distinguish them? An abstract attribute is semantic; a fieldname attribute is structure-based. Denenberg: if we get rid of this distinction, Do we get rid of the notion that you can't nest abstract attributes? LeVan: if they don't have different meanings, don't make the distinction. D Lynch: the schema is the reason for the distinction. The distinction between attribute types is important: nesting, correspondence to retrievable fields, etc. You need to know the schema anyway. Wibberly: for a given database, in some cases the search schema is totally different from the retrieval schema; just because you know one doesn't mean that you know the other. An attribute set is not the same as the schema. A subset of attributes is available for searching, and it helps define the schema.
Denenberg: we agree that the distinction is not necessary, but we do not agree whether the distinction is intellectually useful. We also agree that some rules apply to one, but not the other. If we get rid of the distinction, we'd probably call this "access points." Wibberly: semantic qualifiers are a way to structure the attributes; they are orthogonal to this and have nothing to do with nesting -- though they are part of a search schema. LeVan: you will find that there are behavioral differences between the two that will lead to unexpected complexity when you venture off. Working on Dublin Core, he kept these as abstract concepts, not specific fieldnames.
D Lynch has a nagging suspicion that over time we will become aware of other important distinctions that we haven't articulated yet. What happens when they come forward? This is a good argument for attribute sets as a name space with different ways to conceive them.
C Lynch: the underlying intellectual reason for Abstract vs Fieldname was to reflect two views of the world. In one view, people have come up with a vocabulary of access points for different communities and then mapped these access points to individual implementations. The mappings and interchange are very complex and possibly serendipitous. The other view of the world is closer to traditional database applications where you have fields in the database and you ask for them by name. Parts of the ZIG are moving to more traditional approaches rather than intellectual community approaches.
Denenberg: if we get rid of the distinction between Abstract and Fieldname attribute types, we should remove the associated prose in the architecture document. Percival: but the architecture document includes a discussion of what you do when you have fieldnames and you want to offer them as access point attributes. Denenberg: it's real awkward to keep this discussion in the architecture document when we remove the distinction between the two.
D Lynch: we're missing some philosophical things here. The notions of anchoring, occurrence, and nesting may not make sense. The problem is that we don't have a place to answer questions like this. With retrieval, we've separated tagsets and schemas, so you can say: here are the labels and here is the structure within which these labels can appear. So if I'm designing a retrieval model and I have a thing called "title," it will be tricky to decide what kind of title this is. Is it an abstract title or a field title? We can say concrete things about what you can and can't do. The underlying things are significant distinctions. What we're talking about here is whether it makes sense to divide stuff into column A and B. If we do away with the columns, then we have to have a place to make the distinctions.
Denenberg: but what are the current outstanding distinctions? We decided that we can nest Abstract attributes, but some don't make sense (e.g., Publisher within Date). You'd really rather know which questions it makes sense to ask about records in this database, than have the attribute designer decide what goes in column A or B. Zeeman: one of the issues with the Abstract attributes is that it let's you finesse questions. D Lynch: we need some distinctions... but not necessarily the distinction between Abstract and Fieldname attributes. Hinnebusch: maybe we're not asking the right question.
Wibberly: what if we introduced the search schema as a formal concept and don't decide whether a given attribute falls into column A or B (Abstract or Fieldname), but instead have the schema indicate whether something is abstract or not? The database schema may be flat and will determine what can be nested. The schema in effect becomes a set of rules for what can be done with the database.
Denenberg thinks we should merge Abstract and Fieldname attributes into one attribute called Access Point, and that the attribute architecture doc should say little if anything about the distinction. Hinnebusch: it sounds like we're heading to a new search schema mechanism that the architecture document will have to address eventually. C Lynch: be careful about this and don't overlook the interoperability issues. The more you move to formal schemas, the more you will be forced to human interpretation. Without some kind of knowledge representation, humans will have to interpret. His concern is not with the bibliographic knowledge base, but that, if we start with this schema business, in new places where we want shared semantics we'll end up with multiple individual or private scenarios.
D Lynch: the notion of schema is in our heads, not in the protocol; the info can go into Explain. Hinnebusch: in the absence of Explain, the human is always in the loop now. C Lynch disagrees; in the bibliographic community there is shared semantics. Stovel: don't remove things from the architecture document until we have a place to remove them to; just bracket them for the time being. C Lynch: we will keep the philosophical distinction and rework this.
Discussion:
Digression and quibbling about whether or not we need a new query to solve the "two publisher" problem. If this can't be solved with existing proximity, then we need to consider a new query type (and be careful that it doesn't go the route of type 102). Caution about using a "within" operator to join things that are not at the same level.
Selway: even if we take a tree approach to nesting fieldnames, it's not clear that that would stop us from putting a string of attributes on a term right there. LeVan: lists of attributes and semantic qualifiers (which refine semantics -- "a kind of"-- in a logical space) are difficult to interpret. How does this relate to the current discussion of nesting? Why do we need semantic qualifiers? Why can't we do it with nesting? Hinnebusch: think of the intellectual space as a semantic or concept tree. What does nesting buy us that semantic qualifiers don't? Wibberly: semantic qualifiers provide precision within the current attribute architecture. Denenberg: with semantic qualifiers, you give a list of choices and the server picks the best one; nesting isn't the same thing.
Discussion postponed.
Discussion:
Hinnebusch: for normalized weights to make sense, at the point of query you have to specify what you're measuing against (i.e., the attribute set must specify the normalized weight attribute).
Agreed: the ZIG will keep a weight attribute and discuss the number when we discuss the Utility set.
Discussion:
Wibberly: we don't want to lose the ability to specify how the server behaves. Denenberg: suppose there was no Bib-1 and no word list . Hinnebusch: the problem with word list is that we never associated things with it and people never knew what we did. We can use word list, but we need to specify the rules for this. We need "set" membership or Operation attributes if you're not going to use Relation attributes to do it. We have a new agenda item for the attribute architecture group.
Waldstein: the server will parse language strings into tokens. We want to do a lot of things with this list of tokens. Wibberly: the client may want to control what the server does with this stuff (e.g., stopwords, phonetics, stemming). What is our model? What do we expect the server to do? Once you talk to the server about how it's going to parse the string, you have lots of questions. Hinnebusch: didn't we say that we were going to use otherInfo when the client wants to provide some guidance to the server for how to parse the query (e.g., a book recently published with non-alphanumeric characters in the title.)
Stovel: it would be useful to separate normalization and stopwords. Stopwords are on pre-existing lists; the punctuation in this example title is not stopwords. Dekkers: stopwords are on pre-existing lists and are language dependent. Stovel: anything you say about stopwords gets you back to word lists. Neither she nor Waldstein want word lists. Denenberg: the problem is when there's more than one potential stopword within a term and the user wants to suppress only one of them, but doesn't know what the stopwords are.
What's a "phrase"? Denenberg: language string and character string are data types:
Denenberg: would it be reasonable to weaken character string definition to handle case and punctuation? We could apply attributes to a character string to explicitly request certain processing, which would differ from language string because language string leaves processing decisions up to the server to do what it wants. Diacritic normalization is another important area to consider; it should go into the Utility attribute set.
We're tentatively going to merge language and character string into character string. Don't do any processing other than those specified. We'll add attributes to the Utility set to do this, and add word-by-word truncation. All of the problems with word list go away with this approach, but it will create a new set of problems. We need some way to keep or specify "ordered" or we lose the concept of phrase.
Lots of discussion here about the verbage to describe this in the attribute architecture and on the wire. If you have a sequence of terms and apply all of the attributes to each term in the sequence, the logical way to do word-by-word truncation is for the client to break up the terms in the sequence and apply the truncation to each term. The thing is still one operand.
LeVan: we have lost phrase for browsing purposes (though not for searching purposes). Disagreement. Denenberg: let's call the Format/Structure attribute that we're working on "set". You do all operations on the "set" which is sent as an external sequence of terms. The set addresses the case where -- if we didn't have Bib-1 and the mess called "word list" -- we'd call it word list. But it isn't always a list of words! Anyway, it's a set of character strings, with various set operations that could be applied.
We need to be able to specify a specific list, e.g., of musical instruments. We need to use something like word list that is precisely semantically defined. Hinnebusch suggests using existing terms with good semantic definitions, e.g., "set." Dovey will work on this problem.
D Lynch: did we get carried away with this "set" thing? Isn't it just a sequence of terms? Hinnebusch: no, but set is orthogonal to character string.
C Lynch: keep in mind that there's a discussion here about, as things are extended into new applications (e.g., spatial things that need new data types), one open question is whether we need to hardwire sets as a new data type, or whether we just need to recognize that there are different applications that use different types of sets. We haven't needed sets so far. There are some legitimate questions about the Utility set.
Hinnebusch: is the concept of language string vs character string that fundamental? C Lynch: you propose to merge the two and then sneak the distinction back in through attributes. The reason for the original distinction was to avoid or reduce the formulation of bizarre and ambiguous queries. There's a lot to be said for restricting lexical operators to character strings as opposed to language strings. Hinnebusch: in both cases the group introduced things into the architecture to try to protect against irrational queries. Should the architecture be designed to do that? C Lynch: the combining of attribute sets was foremost in their minds. The more distinctions you can make in the architecture, the more you have a prayer that you can rationally combine attribute sets for extensibility, etc.
C Lynch: it is practical to merge language and character strings. Just highlight that it is going to be possible to produce a larger sense of nonsensical combinations that the server will throw away rather than attempt to interpret. Hinnebusch: because there are no defaults, it will be required to have one of these qualifying attributes so the distinction cannot be blurred. Denenberg agrees with Cliff that it's a good thing to articulate these things in the attribute architecture if it's going to help with the overall objective of avoiding nonsensical searches. The problem that led to this was that we were unable to adequately articulate what we meant by language string. Hinnebusch: no, the only distinction we could make was how the client expected this thing to be processed. C Lynch: that's exactly the distinction: how do you tell the client. Wibberly: the attribute architecture document has been revised several times. We have that document, documents for particular attribute sets, the guidance document for developing attribute sets. Is it the intent that the description of the core types in the architecture document will remain? Yes.
Discussion:
Denenberg: we determined that numbers and dates have an implicit order, and decided that dates can use the same comparisons as numbers. As long as you're comparing comparable units, you can use the same comparisons, like less than, greater than, etc. Can we extend that to character strings? No. Then quibbling. Lexically, less than and greater than depend on the language, case sensitivity, etc. Is the written language (e.g., four) the same as the numeral (i.e., 4)? It is not lexically equal, but it is mathematically equal. Taylor: if we must put coercion on either the comparison operator or on the data type of the term, his preference is on the term. Selway thinks it should be on the comparison operator.
We don't need a separate kind of comparison operator, but how do we solve the problem of collation of characters? Negotiating a language or character set (e.g., Unicode) says nothing about collation. D Lynch: the server is just going to do what it does because the terms are in the index in a certain order. The ZIG agreed to get rid of comparison (relation) operators for character strings. We can just use the mathematical relation operators. We can always add an attribtute that says treat lexically.
C Lynch: we proposed lexical comparisons to specify what we can do and when. For example, we have an index of integers or an index of strings; if the client passes the wrong data type, the server has to try to convert or fail the comparison. He suggested the lexical comparisons to essentially remove ambiguities around handling numbers that are passed as strings or numbers. It's not the case of "10" vs "ten," but cases where there are minus signs, leading 0s, etc that compare differently if treated as strings or numbers. You can accomplish much the same thing by relying on the server to do type conversions when necessary, but it won't always be clear to the client. D Lynch: if you mean it's a character string, then say so, if you say it's a character string and pass a number, you're lazy. The rule is honor the Format/Structure attribute type (not the ASN). LeVan: why? If the client is smart enough to know that the value of that attribute was supposed to be numeric, then it was smart enough to encode it as an integer.
Denenberg: our earlier example put the work on the server side. We need to decide whether the conversion is done by the client or server. The suggestion is that if the term is a number, the type is an integer or numeric data type. We don't send strings when we mean numbers. Agreed.
Taylor: what's a "call number"? Though the ASN for a call number is international string, we still want to compare them. Dates and names are better examples. Denenberg: if you send a date as a Z39.50 date, you're OK. What is the ordering that we're comparing with names? C Lynch: for example, you want to do an alphabetized name thing. There it really makes a difference because you normally do this on last names. We've left it as an open issue whether we want a normalized name. Having an attribute saying that this is a normalized name would make a difference.
Hinnebusch: we're off track here; we're talking about comparisons. Taylor: we may have multiple high level data types and we want to make distinctions even if they are passed the same way on the wire. Hinnebusch: if there are cases where Format/Structure attributes will be necessary to guide the server, then it would be better to tell attribute set designers to always use Format/Structure attributes rather than ASN. Taylor: yes, because this gives the server a rule to follow.
Hinnebusch: looking at the Utility set, the Format/Structure attributes are decreasing quickly. Is a personal name a Format/Structure attribute? C Lynch: it doesn't matter that much as long as we're consistent. We want to do data typing. For some data types, we get this from the ASN.1 structures. In other cases, we encode things in a structured way in strings and we need an attribute to indicate that this is a particular form of encoded string (with Format/Structure attributes, e.g., this is a normalized name). One issue is what guidance we give. A second issue is the current hybrid situation where sometimes we look at the ASN.1, other times we look at an attribute, etc. The working group went round and round on this and decided that it looked application specific. Denenberg: the issue is that you put a Format/Structure attribute in, regardless of whether you have ASN or not. LeVan: this is redundant and possibly conflicting. Denenberg: do we need any Format/Structure attributes now? C Lynch: yes, but do they go in the Utility set? Bib-2 will definitely have Format/Structure attributes. These tell you what you have, but not what to do with it.
Taylor: we must do strong data typing so that we can compare terms in the query. We can do this with ASN.1 types entirely and throw away structure attributes. Or we can say that ASN.1 is an artifact of the implementation and look instead to the structure attributes. Or we can do some combination of the two. Hinnebusch: personal name, not normalized name, is the issue. Zeeman, LeVan, Denenberg, Wibberly, Hinnebusch, C Lynch, D Lynch, Taylor argue. ISSN and patent numbers are interesting examples. How are these numbers structured? You need a structure attribute. Stovel disagrees with the ISSN example. C Lynch: if ISSN is the access point, you're going to try to interpret that string, probably do normalization (insert or remove hyphens), but if you had an omnibus identifier field of assorted identifiers, you need some way to indicate that this is an ISSN. Taylor will fight anyone who disagrees: we must say what the thing is we want to search for and where we want to search.
Wibberly: the confusion about Use attributes representing too many dimensions led to the new attribute architecture. Do we need to keep Format/Structure attributes? Wibberly: yes, because there are different formats of patent numbers (which may be stored differently in different database indexes). Stovel: if you have enough information to restructure a patent number, then you have enough information to create an ASN.1 data type. Let the bib-2 people struggle with this.
Hinnebusch thinks we can get rid of Format/Structure attributes by moving them into Expansion/Interpretation attributes. C Lynch insists that we need Format/Structure attributes. Hinnebusch: either there is a well-defined rule and an authority for the rule, or the thing has been pre-processed, or the thing needs to be processed. D Lynch is almost willing to agree with Hinnebusch. We could indicate that the thing is an identifier with an associated authority (e.g., patent number, ISSN, ISBN, LC call number, Dewey call number), or we could indicate that this thing is a certain kind of identifier (e.g., LC call number or Dewey call number).
Dovey: what's the difference between Content Authority and using a semantic qualifier? LeVan: Content Authority attributes put restrictions on the value of something. Semantic qualifiers restrict the meaning of something.
C Lynch agrees that we can drop lexical comparison operators if we agree to indicate and send the right data type. Agreed that there are three or four things that inform the server on data type that must be clarified in the architecture document:
LeVan: however it gets crafted, it should all come from the new Utility set. Agreed.
Mike: we need to add a Local Control Number attribute to the Utility set and retrieve on that value. He provided the following semantics: Local Control Number is a string that uniquely identifies the record in the database; it's the most persistent identifier that the server can provide.
Will they be distinguished by object type? Wibberly doesn't want different object types to require different attribute sets. Will there be a cross object-type attribute set? Taylor: based on "the principle of least astonishment," let's call it "cross domain."
Dekkers' reality check: is anyone going to be quickly implementing the new architecture? LeVan: the Dublin Core set is sitting there, based on the new architecture. Waldstein isn't even sure what types of attributes came out of yesterday's discussion. The types are those in the doc, though yesterday's discussion got messy because we digressed into a discussion of specific Utility attributes. Agreed. The types of attributes seem to be stable, but not the attributes per se. We still have big conflicts to resolve, for example, in the Utility attribute set. Taylor: the CIMI group is murmuring about recasting CIMI attributes in the new architecture. Hinnebusch: there were many changes as a result of the discussion of the attribute architecture issues; Denenberg has to revise the doc and post another draft.
Do we want to leave the 15 Dublin Core (DC) attributes in Bib-1 (leaving it up to the server to do what it wants) or take out the redundant ones? Hinnebusch: whatever we decide, this does not preclude the DC folks from doing pure Dublin Core work or using those in a mixed attribute set query with Bib-1. Wibberly: if the redundant DC attributes are removed from Bib-1, what happens to the numbers? They will be deprecated.
Discussion.
Hinnebusch: is anyone going to be really really unhappy if we leave the attributes in? (Probably Hammer, who isn't here.)
D Lynch proposes that we leave it the way it is and don't discuss this anymore. Agreed that leaving the DC set in Bib-1 is the least objectionable. Semantics/mappings: whatever a server decides.
6. Attribute Set Developers Guide (Percival)
The doc will have four major sections:
Wibberly suggested an additional section on lessons learned. Percival has included this as part of the background and analysis section, e.g., lessons learned from Bib-1, Dublin Core, CIMI, STAS, GILS/GEO/CIP. Is this a developer guide (for people creating attribute sets) or an implementor guide? Section 4, how to use attribute sets, is implementation.
The historical component of the architecture document may be moved to or repeated in the Attribute Set Developers Guide. Neither doc is a standard and redundancy may be good. Stovel can provide some text about Bib-1 for the lessons learned section. Moen will help with the CIMI part of this section, LeVan with the Dublin Core lessons learned ("emergent ontology" ha! ha!). Wibberly will help with STAS. Percival will work on the GILS/GEO/CIP component, along with text from Elliot and Doug (?). Taylor: add several other lessons learned: ZDSR (significant intellectual effort that LeVan is implementing), Collections, Explain, ZBIG, the Bob (Waldstein) "rogue" attribute set, etc. Denenberg will help with ZDSR, ES, and Explain.
What should be included in the analysis section? In addition to what Percival proposed, include a discussion of Explain. The semantic analysis of attribute set relationships is the heart of this section (not a discussion of syntax). How should he work semantic qualifiers into this section? If we're going to compare attribute sets, we need to structure them or prepare meta-attributes for comparing them. C Preston was working on this a year or so ago; Percival will contact her. Dovey will try to talk to her at the ASIS meeting next week.
Percival will include examples in the discussion of the attribute set meta-model (e.g., inheritance, derived attributes, higher level attribute sets, etc.). We need rules for deriving attributes from other attributes. Some discussion of what belongs in a profile and what belongs in an attribute set. Denenberg: this guide has to deal with profiles because profiles cover things that are not in attribute sets. For example, profiles provide details about combinations of attributes or attribute sets that facilitate interoperability. Denenberg will provide some text for this part of the doc. We now prohibit creating attributes that have the same semantics as other attributes -- instead, mix attribute sets in the query. The profile clarifies this.
Section 2 (background and analysis) will end with recommendations for structuring attribute sets. The Z39.50 Maintenance Agency (not the ZIG) will maintain a list of attribute sets; the Maintenance Agency assigns OIDs for attribute sets.
What do we mean by inheritance? Denenberg: if you find that a Use attribute you need has already been defined elsewhere, you use the attribute from that other set. LeVan: this is mentioned in the profile. Yes, but the attribute set definition can mention where this attribute comes from (just for clarity for the reader). Wibberly suggests that attribute sets be available and searchable. Good idea but how? (The question was considered out of scope here.)
Section 4 (how to use attribute sets) is to help developers determine whether existing sets have what they need. Maybe a new profile or maybe a new attribute set is needed.
Percival plans to have a draft of the doc by the next ZIG meeting. He'll put the doc on the web and Denenberg will put a pointer to it on the Z39.50 Maintenance Agency web page. Stovel is concerned that the doc will make the task of developing an attribute set (look) too difficult. Percival agrees. All agree that the doc should read like a cook book with additional information provided (perhaps at the end).
7. OPAC Schema (St-Gelais)
St-Gelais: there are substantial changes to simplify the schema. There are still several levels you can define. She reduced eight to six levels:
St-Gelais: GeneralHoldings was renamed BibUnit. ExtentOfHoldings was split into Physical units and Bibliographic parts. There was some discussion of the Circulation Data value "waiting to be re-shelved." Dovey: how is this different from "Pending transfer" and "In transit"? Gatenby: what's the difference between "Weeded" and "Withdrawn"? Clarifications were provided. The values (page 22 of the schema) are an attempt to handle any circumstance or procedure. St-Gelais: one holdings statement is provided per physical site location.
There is additional appendix (C, page 37) for using Z39.50 rURL. The element set is at the same level as the holdings request. The record syntax is specified in the profile.
How do we data type an OID in a URL? Hinnebusch: we define the record syntax as an OID. Denenberg: yes, but it doesn't say how to represent the OID. Agreed to do it as St-Gelais has done it in the example on page 38. The server builds the URL and returns it to the client so that the client knows how to formulate the query (extract the docID).
Abstract Record Structure doc, page 4: Hinnebusch suggested corrections to Enumerator (level one, alternative level one?) to handle different numbering schemes for the same journal issues and volumes. Wibberly: adding support to transmit electronic holdings looks easy to do. Stovel objects to the proposal to remove "Physical," claiming that "Computer File" as physical type is adequate. Wibberly disagrees. He also wants the holdings record to include a URL for the electronic file (something like the 856 field). Stovel: logically that's the same as the call number for the physical item, so put it in the same place in the schema. Hinnebusch: this is like an 856, but not exactly; like a call number, but not exactly. See page 5 of schema doc, Figure 1: following this model, the electronic file is a physical unit, not a copy call number. Or is it? Discussion ensued. Agreed to put it at the level of the Physical unit.
(?): question about what information to put at mandatory levels where he does not have the information (i.e., a union catalog). He does not provide information about distinct copies, only information about each site or library that has the item(s) -- a single summary holdings statement per library, not per copy. (See schema doc, page 7.) He requested adding text to the schema to indicate alternative interpretations of summary holdings. The ZIG doesn't want to do that. His users would then query the specific library for copy holdings information. Gatenby: we need a level 1A for a union catalog; if we try to squeeze this into level three, it destoys the meaning of level three.
The ZIG agreed to the following resolution of OPAC holdings for union catalogs:
The profile will be revised to reflect the changes discussed here (or yet to come). We still need to discuss how to search or present these things. Of what use is the schema without this? The schema provides what you will get back. Hinnebusch: we need at minimum primitive ASN. In the meantime, we can continue providing holdings based on bibliographic queries (return GRS records based on the schema).
Lief Andresen paper (Proposal for addition to OPAC/Holdings Schema -- Level for Availability on Bibliographic Level) - The proposal is to add new data elements at the bibliographic level of schema. Andresen: we need a national union catalog solution using Z39.50. What we need to know as a response from the library is not all of the information about the copies, but information on a normal level for the site. We don't care about the detailed information about all of the copies. What we need is an answer to the question "can I get a copy?" We need information on a global level. He and St-Gelais believe they can find a solution.
Gatenby suggested expanding the list of values for lending policy (?). Agreed.
8. Proposed Diagnostics (Gatenby)
Wibberly suggested that the sort criterion preferredDatabase be changed to a sequence of implicit international strings (see page 12). He also suggested that Other (as an external) be added to DuplicateDetectionCriterion, RetentionCriterion, and SortCriterion for extensibility. Agreed that we need extensibility, but decided to reserve a set of numbers for this (6-10).
Hinnebusch asked about need for Input Result Set ID and Output Result Set Name parameters. Answer: for symetry.
Agreed: This will become a new Z39.50 service, the Duplicate Detection service. Denenberg will add an option bit.
10. Commentaries (Denenberg)
11. Clarifications (Denenberg)
12. Amendments (Denenberg)
The ZIG accepted both amendments. Denenberg will put a new OID on eSpec, but will not change its number. He will add an option bit for resultCount parameter in Sort.
13. Future ZIG Meetings (Denenberg)
14. Representation of Time Units / Periodic Query
15. Registration of Record Syntaxes for XML Mime Types
16. Registration of Character Sets and Character Set Negotiation
17. Search Based on OPAC Holdings Elements (Van Lierop)
Discussion:
The paper proposes four options, with option 4 being preferable. The secondary search on holdings (option 2) is theoretically impossible because the OPAC record is one record. So we should provide an option of one bib record with X item records, which changes the OPAC schema completely. The only possible solution is in the element specification. We should be able to manipulate the data in the record. This solution was proposed at the last ZIG, but people rejected it. People want item or volume location information. This is a search within a record, but that's not possible in Z39.50, so we use the eSpec mechanism to filter or manipulate the data within a record.
Gatenby thinks option 3 is valid, not just option 4. The proposal does not preclude being able to limit a search by location. Disagreement about whether option 2 solves the problem; apparently it does, but we don't want to do it that way. St-Gelais: the problem with the holdings record is that you don't know what you're going to get back (what the institution did). Denenberg: we discussed doing this with eSpec, but nobody wanted to do that. The point is to model it in a way that's consistent with Z39.50, which means you don't stick a type 1 query within eSpec.
D Lynch: let's use eSpec to filter the record. For example, you get a record and you give it a tagpath with values for the data (implies revising or expanding eSpec). Bottom line: you can cast Van Lierop's proposal in a way that avoids a query, e.g., say you want tag 100. Zeeman: you can see variants as a filter on the record. D Lynch: it's more like occurrences, e.g., I want the case that has this value. Hinnebusch: I have a record, and the client wants to specify which piece of the record it wants back, but we don't know the structure of the record so we want to specify values. Zeeman would not be happy if we said you could only filter on location if you included location as part of the initial query.
Denenberg: is the plan to filter on the institution ID? Yes. That's fundamentally different from saying you want to filter on occurrence (D Lynch example) because these things are not explicitly enumerated. Van Lierop: it's not a search. D Lynch: the fundamental question is whether there's a difference between requesting an enumerated section or requesting a section based on some value. The model is retrieving a subtree. Hinnebusch: eSpec represents the record as a tree, and using a tagpath you retrieve one subtree at a time. Will that suit here? Zeeman: don't we want to retrieve only those that will lend to me? That's a harder and different problem. Van Lierop wants a third element in the filtering, the breakpoint. For example, I want all the books that are for lending.
Hinnebusch: this issue has nothing to do with OPAC holdings, but is a more general problem. With eSpec you can retrieve a subtree rather than the whole tree. Using the "wild thing" or "wild path" you can retrieve everything at a certain level and below. The current problem is that you want a piece of a particular subtree (the breakpoint) at a certain level based on a value. If the value of A is X and B is Y, then do something. This is a Boolean query and eSpec doesn't do that right now.
Perhaps what we're talking about is an operation that decomposes one or more records into a database of multiple records (subtrees); the operation tells you the database name and how to build a query, and then builds a result set out of it (C Lynch, Hinnebusch). We discussed this previously as a Filtering Extended Service (Wibberly). Denenberg: what's the step or operation that goes from result set to database? Hinnebusch: that's what we have to define. Zeeman: what records do you get back? Hinnebusch: that depends on the breakpoint. We're talking about a new service. Waldstein: you're defining a whole database on the fly. D Lynch: the simple view is that the record structure is the same as the OPAC record, except that instead of one record with subtrees, you have a record for each subtree. Liv: is this two-step searching? Kind of.
Hinnebusch: this is a general solution to a general problem; not specific to OPAC. If you have structured records and you want to be able to query within the record to find subrecords, this is one way to do it. Denenberg: if we're discussing a new service, we better know what problem we're trying to solve. CIMI proposed an elaborate model that would exemplify this but no one would implement it, so we didn't pursue it. Hinnebusch: the problem with CIMI was that generating the necessary data was expensive. Van Lierop proposed the same complexity (two-steps). Hinnebusch: even if you query the bibliographic database up front, you get these very complex records when all you really want is a subset of information in the records. D Lynch: when you have a huge record and you want only sub-components of it, the point is that search and retrieval is the proper model for this.
Waldstein prefers filtering to this notion of a new on-the-fly database and query, which means we have to turn this into attributes. Hinnebusch: but you have to build into eSpec the same thing, which currently retrieves the whole tree or a node of a subtree. In the current problem scenario, what we want to retrieve is not necessarily in the same subtree. Denenberg: is there a way to revisualize the tree so that what you want is in the same subtree? Dovey: if each node of the tree was location, and under each node we have copies, won't eSpec solve our problem? No, because eSpec defines subtrees based on tagpaths, not content at the node.
Van Lierop: how can you search only for holdings that can be lent? Gatenby: you dynamically ask the server to restructure the tree. Taylor is against this approach. How do you address the nodes? We want a dynamic selection at each level of the tree. D Lynch: with eSpec you pick a place in the tree and retrieve everything under it. That's not what we want to do here. Jacob likes the idea of turning the holdings record into a database, querying it and returning a result set. LeVan: you're back at the problem of records that don't lend themselves to mechanical decomposition; you need selection criteria to do the decomposition. D Lynch: parliamentary point of order: we have a couple proposals that we're trying to engineer here; groups this big can't engineer.
Gatenby and Selway propose a way to do all of this in the original search. The server then knows which bits of the holdings record you want. This would require the "within" operator and a structured query. Jacob: depending on the contents of the query, your records will look different. D Lynch: no, either you get the full tree or you get the tree pruned; you're not changing the record structure per se.
Hinnebusch: what about the case when the server can return only pointers to where the holdings data is (e.g., union catalog). Wibberly likes the idea of using eSpec to return the skeleton structure of the holdings record, then work through that to construct a subsequent query, decompose, retrieve subtrees, etc. Hinnebusch: how do I query the content, not the metadata of the holdings record? Stovel: if we wanted the client to do this, we wouldn't be having this discussion. More discussion.
Hinnebusch: let's table this for future discussion. Denenberg: we've covered the exact same territory in the past two ZIGs and gotten no farther.
C Lynch summarizes: you either
The different solutions hide complexity in different places. He's not sure which solution is technically more trouble -- creating a new database or a new query apparatus. A clairvoyant server is an interesting concept, but how do you ask or know if the server is clairvoyant; this is also a problem with distributed databases.
Denenberg: let's agree on how we're going to move this forward after the ZIG. The best short term solution may be to bring the whole record back. Zeeman: but that's problematic with union catalogs. Hinnebusch: the one thing that this record structure offers is the ability for the server to provide parts of a record as atomic units. You could profile this so that for union catalogs the client always knows what it will get. If you can accept taking one location or institution at a time, we've temporarily sort of solved the problem. More discussion.
Zeeman: why can't you model a bibliographic database with holdings and a separate holdings database. Taylor: you can do that; what you can't do is limit a separate query of the holdings database to a particular location. How do you convey the name of the holdings database? Denenberg: what about adding the database name to the OPAC schema? Hinnebusch: how do I discover it? A group should go away and work on this.
C Lynch: before you assign a group to work on this, focus the idea. Is the goal a minimum work solution for the holdings thing, or do we want to look at comparative engineering for general solutions? Denenberg: do we want a quick and dirty solution for OPAC or a real structured query? Hinnebusch will assign a group before the end of this ZIG meeting.
C Lynch: there are at least three quick and dirty solutions on the table now. Picking one will depend on the work level for the currently existing idiosyncratic issues and databases. Van Lierop: maybe that's the best way to proceed. C Lynch: it's cheaper to move a lot of data than to ask for specific data. Wibberly wants a filter operation on one or more records in a result set that creates a new result set based on one or more criteria; the filter criteria are applied to a particular schema. That's another proposal in the mix.
18. Music Bib-1 Extensions (Dovey)
The request is for Use attributes for performers, instrumentation, arrangements, musical keys, dates of composition and recording, etc. There was some discussion of the proposed Structure attribute called MIME Type for binary data. The OID will identify the binary object classified as a MIME type. This will be affiliated with the Use attribute Musical Theme (1206). The ZIG agreed to delete the Structure attribute MIME Type and revise the semantics of Use attribute Musical Theme. Zeeman: requested that Use attribute Duration (1204) be normalized as seconds rather than minutes. Agreed to normalize on seconds and to get rid of the Structure Attribute called Duration-normalized (110). There were two extra Use attributes to be added (in addition to those mentioned in the handout): number of instruments and number of distinct instruments. Agreed to add these to Bib-1.
19. Negotiation of ES Record Size (Gatenby)
When the client has limits that it needs to convey to the target, we need to devise specifications to negotiate the maximum size for ES messages, maximum size for task packages, and maximum size for task package records (included in Update ES requests). Hinnebusch: do we have the same problem with scan? Wibberly: yes. Denenberg: the difference is that we need this with ES right now. Hinnebusch: should we generalize this now? He's worried about doing this piece-meal. Denenberg and Stovel: these negotiations must be specific to (dependent on) the service; they can't be generalized. Wibberly: when we were defining scan, STAS requested a way to negotiate the record structure for scan, but the ZIG refused so they did it with ES. Hinnebusch: what are other people doing who ran into this? LeVan: the maximum record size or PDU size is a non-issue when we're retrieving gigabytes of information. Denenberg: we don't need to generalize the solution; just solve the current problem.
Waldstein: the whole concept and set of things that we're doing with negotiation adds to the complexity and comprehensibility of the standard. Are we reinventing OSI? If we get too complex, nobody will bother. He has misgivings about this, even though he sees the need. Denenberg: OSI never negotiated anything meaningful; we do. Hinnebusch: the understanding of the interactions gets more complex; you have no way to negotiate what you're negotating.
Denenberg doesn't think this format places any burden on anyone. It does not change how ES is done, but it enables people who have a need to negotiate things like this to do it. If a server can't create a task package greater than a certain size, then if we don't invent this stuff, what does this server do? Hinnebusch: but we're doing this in the z-association, when my server hasn't even indicated that it does ES. Why not do this in the otherInformation field? Many: because that's too late! OK. The init is the right place to do this. Denenberg: but we should address the concern people have about becoming bewildered. We should tell people that they don't have to do this. D Lynch: that's why this is clearly ES negotiations and not generalized.
LeVan: it might be nice to have this discussion in the doc closely tied to the diagnostics, so that people don't have to think about this up front when they're designing their system. Wibberly: hiding the complexity in the diagnostics is not going to help people design their system.
Denenberg: does anyone take issue with the diagnostics? Stovel suggested rewording diagnostic 1051: Scan: attribute set ID required. OK. Denenberg: all of this will be assigned an OID.
20. Negotiation / Information Exchange During Initialization
Denenberg: most of our previous discussion has been fruitless because we have no concrete model for negotiation and information exchange during initialization. He believes that negotiation and information exchange are two different things:
Denenberg is willing to cut the discussion of information exchange if people no longer think it's important. D Lynch: there are occasions when it might make sense to exchange information. Wibberly: why not treat information exchange as a subset of negotiation? D Lynch: because information exchange is distinctly not negotiation. Wibberly thinks it's a real thin line. D Lynch: subsequent behavior (negotiation) happens because the target responded (hand shaking), which it need not do with information exchange. Percival: if you're not going to acknowledge it, then why send it? Denenberg: because over the years people wanted to send information that could be discarded. Hinnebusch: but nobody said that this information could not be responded to, in which case it becomes a subset of negotiation. Stovel suggested discussing negotiation first, since it's clearly the more important of the two.
Denenberg: back to page 2 of the handout and the structure for this. It's a sequence of either the mix of externals (OIDs with parameters) and object identifiers (just OIDs). Given that there are two types of information (negotiation information and just information) and that they may be of type externals or OID, then there are four types of initInfo units to consider:
If the client sends a negotiation record, then the target either recognizes the OID or it doesn't. If the target doesn't recognize it, it ignores it, and no response or negotiation takes place.
A negotiation OID refers to a static definition. (See example in Denenberg handout page 3.) How do we know if a negotiation adheres to this model? Zeeman: we can't rely on defining an object identifier to do this. Waldstein, Wibberly: use an Option bit. Dekkers: how does this affect existing negotiations (e.g., character set negotiations)? If you put the new Option bit in, it provides clarity that you are following this new model (sections 4.1 and 4.2 in the handout, i.e., "I promise to negotiate in good faith"). But even without the new Option bit you can go ahead with the negotiation.
Section 7.2, page 4 of the Denenberg handout: the concept of negotiating a profile is not a meaningful concept. Instead, you write a profile, but you negotiate behavior. Negotiating a global profile is not accommodated by the proposed model. Profiles may have an OID that designates behavior. The ZIG wants this section presented as an editorial comment or rewritten. Denenberg will rewrite and tone it down.
Hinnebusch: where are we on this topic? In the past we talked about negotiating record syntaxes or attributes. We can't do that because attribute sets are specific to databases; nothing in the attribute set ID says that it should be used for a particular purpose. Big discussion here about whether record syntaxes are database-specific. (Attribute sets are definitely database-specific.)
Hinnebusch: this replicates the function of compSpec. Zeeman: yes, but it lets you determine this earlier (in the init). Explain can do this as well as negotiation. Do we create a mechanism for doing this where the client informs the server, or tell the server to provide this information in Explain? D Lynch: what's important is the record syntaxes supported by the client. Denenberg: this is way off the subject of the negotiation model. Hinnebusch: what's the point of section 7.3 if not to take us down this path? Perhaps this section should just say that if you want to use one or more existing OIDs, they should be encapsulated in an external. LeVan suggested removing raw OIDs from this entire model; the ZIG should produce a negotiation profile record (a context for the OID to be meaningful). Wibberly: that makes sense; you have a general model and a specific record syntax to carry the OIDs in context. D Lynch agrees. Hinnebusch: if we do away with OIDs and we want to continue with the concept of information exchange that doesn't control behavior, then it should become a negotiation record.
Denenberg proposed several types of negotiation records at the last meeting and the discussion there resulted in the model. Now we're back to a discussion of different types of negotiation records. Hinnebusch: yes, but don't throw away the model. Chuckles. Denenberg: the problem is that the concept of sticking profile OIDs in a record is not very meaningful. He hopes that the concept that we're left with enables us to really negotiate behavior. Stovel: this is a chicken and egg problem; we're trying to make a place for something that doesn't exist and retro-fit what we have.
Hinnebusch summarized the discussion from three ZIG meetings: defining OIDs for profiles and using them that way is not helpful. We need negotiation records that contain contextual information beyond just the profile name and enable negotiating subsets of behavior with a tree of OIDs. LeVan: we're not really negotiating anything; we really want to use different profiles for different databases. Denenberg: it's up to the client to figure out what's valid to do (without conflicts).
Denenberg sees assigning negotiation OIDs to whoever wants them. Wibberly: will there be a negotiation record for each thing you want to negotiate? Yes. It will always be an externally defined structure. Right. If this is the general approach, are folks expected to submit an external record structure for each thing they want to negotiate, and each of these will be assigned an OID? Denenberg: that's what we need to figure out. If the model is OK, then the next step is populating the model. There's always the tension between lots of OIDs and flexibility, and decreasing interoperability. LeVan: ZIG OIDs are assigned to ZIG-approved things; otherwise the OIDs are private IDs. The ZIG agreed that this is the model, but (Hinnebusch) profiles are a little different. Denenberg: yes.
Waldstein: this sounds like a negotiation of profiles, which will be a negotiation of contracts. He will ignore these negotiation records because he's not trying to sell anything. But he does want to interoperate. He has trouble with negotiating profiles because the profiles have become too massive. This is going to create a lot of work. Denenberg: you can negotiate profiles globally or negotiate specific sections. In the case of CIMI, for example, we'll give them an OID for the purposes of the current discussion, they'll sub-allocate that OID or direct it to the section of the profile that specifically addresses behavior. You should be able to read only the sections that address behavior. Moen: this approach would allow the client and server to understand what level of conformance was in place (without having to read the entire profile).
Hinnebusch: you have a rule in there that specifies what happens when a client designates two levels. Denenberg: they should certainly explain how this works. Waldstein: do people think this will be negotiated in the profile or negotiated in the contract? Percival provided an example of an intersection of two profiles as a third level (kind?) of conformance. Denenberg: do we need a profile architecture committee? Percival: we need to clarify the difference between a profile and an attribute set. We can say that when you use an attribute in a given profile, you essentially inherit it from somewhere else. Yes. Selway: what about profiles that change the meaning of an attribute that is defined differently in an attribute set? Wibberly: there's a difference between a profile refining an attribute and a profile changing the meaning. Hinnebusch: yes, but what if two profiles refine an attribute differently?
Denenberg will post an updated model. People should disclose what they want to negotiate (post to the ZIG list), then we'll create negotiation records.
Dekkers is the technical supervisor of this project, which has many participants. The project started this year and aims to open new and enhanced access to disparate modern manuscript holdings in European libraries, archives, etc. Many cultural institutions and publishers across Europe are cooperating on the project. See http://www.crxnet.com/malvine.html for details.