SRU (Search/Retrieval Using URL)

Implementor Agreement: CQL Index-naming Convention

Implementor agreements are developed and proposed by the SRU Editorial Board and submitted to the SRU Implementor Group for review and comment; subsequently if the Editorial Board determines consensus it approves the proposal. The approval of an implementor agreement in itself is not binding. Implementor agreements serve two purposes: (1) general guidance to developers and implementors; (2) they may be cited by profiles; a profile may declare that in order to claim conformance to that profile a particular implementor agreement is mandatory.

Approved: January 1, 2005

Intended Audience: This Implementors Agreement is intended for Context Set and Interoperability Profile developers. SRU database implementors might be interested in this document, but should not feel bound by the rules described herein. SRU database implementors with interoperability requirements should be guided by appropriate Interoperability Profiles.

Background 

This is an implementor agreement for a naming convention for CQL indexes.

Schema designers tend to invent data element names haphazardly without documenting semantics, and this is becoming widely recognized as one of the fundamental barriers to broad interoperability. ISO 11179 addresses part of this problem, by prescribing rules for registering data elements and their semantics to enable re-use and coherence.

We introduce the ideas of referent type, concept, and representation class. Consider for example a database where records are metadata about maps. Consider an index of map creator names. In describing that index, one might consider it to have three logical components: what the database is about (maps), what information is sought (creator), and how that information is represented (name). We refer to these as the referent type, concept, and representation class.

The example index might thus be named mapCreatorName. On the other hand it may be sufficient (and perhaps more sensible) to simply name it creator. The name mapCreatorName would be useful when there is more than one possible referent (map, atlas, etc.) and more than one representation (name,code, identifier, etc). When there is only a single referent and only a single representation then the index is adequately described by the single concept, e.g., creator. (And When there is only a single referent but multiple representations then the index might be best named creatorName; when there are multiple referents but only a single representation then the index might be best named mapCreator.)

In any case, the concept is the critical component (in the example above, whichever name is chosen, the concept is 'creator'). In order to encourage the re-use of concepts across different contexts, it is useful to deconstruct the index into these three components so that the representation and referent are disentangled from the underlying concept.


Rules
Note: These rules have no protocol significance. For instance, they define component parts of an index, however there is no expectation that servers will parse the index names into component parts.

  1. Basic Rules

    1. An Index name has one to three components:
      • Referent type. Optional. If omitted it should be inferable from the index or context set definition (and if it is not, it is unspecified).
      • Concept. Required.
      • Representation class. Optional. If omitted it should be inferable from the index or context set definition (and if it is not, it is unspecified).
    2. The referent type and representation class are single words, English nouns. The concept is one or more words, one (and only one) of which is the base concept, which is an English noun.
    3. When contemplating the creation of an index name, the creator should attempt to determine if a suitable index name already exists (in the same or another context set), and if so, use it rather than creating a new index name.
    4. A context set definition whose index names conform to this agreement should state so.
  2. Definitions
    Every referent type, concept, and representation class should be assigned a single associated definition. Where practical the definition should try to follow ISO 11179 part 4:
    1. be stated in the singular;
    2. state what the concept is, not only what it is not;
    3. be stated as a descriptive phrase or sentence(s);
    4. contain only commonly understood abbreviations;
    5. be expressed without embedding definitions of other data or underlying concepts.
  3. Syntax
    1. The referent type, if present, is the leftmost part. The concept follows, and the representation class, if present, is the the rightmost part. If there is more than one component they are concatenated.
    2. If the concept has more than one word, they are concatenated with the base concept the rightmost word.
    3. "en-US" spelling is to be used.
    4. The character repertoire is letters [A-Z, a-z] and digits (0-9).
      Note: Since CQL Index names are case insensitive, two index names should be considered equivalent if they are the same when normalized to all lower case. Two such (equivalent) index names should not be defined within the same context set. Similarly, index component parts (referent, concept, representation) are case insensitive and two equivalent component parts of the same type should not be defined.

For index names that do not conform to this agreement, It is strongly recommended that the context set definition (for the context set to which the index belongs) state that its index names do not conform to this agreement.


Examples

  1. Consider a database where records are metadata about books.
    Consider the (hypothetical) index name: abstractLanguageCode
    It has the following components:
    Referent type: 'abstract',
    Concept: 'language'
    Representation Class: ' code'
    "abstractLanguageCode=de" would find books with an abstract written in German.
  2. Suppose instead you want to search on the language of the text rather than that of the abstract. In that case the index name might be textLanguageCode.
  3. Suppose that only the text is searchable on language, and only by code. In that case a suitable index would be dc.language.
  4. Suppose instead you want to search on the language of the record, rather than that of (part of) the resource.
    Theoretically, an appropriate index name would be: recordLanguageCode.
    However (by rule 4 above) such an index name should not be constructed, instead, the index name rec.languageCode should be used, where 'rec' is the Draft Record Metadata Context Set version 1.1 associated with the URI: info:srw/cql-context-set/2/rec-1.1/. For this index name the referent is not included and is implicitly 'record'.
  5. Suppose you want to search on abstract languages by code (assuming some code list for abstract languages). The index name might be abstractLanguageCode where 'abstractLanguage' is the concept and 'code' is the representation class (no referent type). Note that this index name is identical to the name in example 1. These two index names must not be defined within the same context set. 'abstract' is the referent type in the first and is part of the concept in the second. This example is intended to illustrate that there is no reliable way to parse index names into their component parts; the index name definition should indicate what the component parts are.
  6. Consider the index dc.title. The concept is 'title'; there is no explicit referent type nor representation class.