Implementor Agreement: CQL Index-naming Convention
Implementor agreements are developed and proposed by the SRU Editorial Board
and submitted to the SRU Implementor Group for review and comment; subsequently
if the Editorial Board determines consensus it approves the proposal. The
approval of an implementor agreement in itself is not binding. Implementor
agreements serve two purposes: (1) general guidance to developers and implementors;
(2) they may be cited by profiles; a profile may declare that in order to
claim conformance to that profile a particular implementor agreement is mandatory.
Approved: January 1, 2005
Intended Audience: This Implementors Agreement
is intended for Context Set and Interoperability Profile developers.
SRU database implementors might be interested in this document, but
should not feel bound by the rules described herein. SRU database implementors
with interoperability requirements should be guided by appropriate Interoperability
Profiles. |
Background
This is an implementor agreement for a naming convention for CQL indexes.
Schema designers tend to invent data element names haphazardly
without documenting semantics, and this is becoming widely recognized as
one of the fundamental barriers to broad interoperability. ISO 11179 addresses
part of this problem, by prescribing rules for registering data elements
and their semantics to enable re-use and coherence.
We introduce the ideas of referent type, concept,
and representation class. Consider for example a database
where records are metadata about maps. Consider an index of map creator
names. In describing that index, one might consider it to have three
logical components: what the database is about (maps), what information
is sought (creator), and how that information is represented (name).
We refer to these as the referent type, concept, and representation
class.
The example index might thus be named mapCreatorName. On the other
hand it may be sufficient (and perhaps more sensible) to simply name it creator. The
name mapCreatorName would be useful when there is more than one possible
referent (map, atlas, etc.) and more than one representation (name,code, identifier,
etc). When there is only a single referent and only a single representation
then the index is adequately described by the single concept, e.g., creator.
(And When there is only a single referent but multiple representations then
the index might be best named creatorName; when there are multiple
referents but only a single representation then the index might be best named mapCreator.)
In any case, the concept is the critical component (in the example
above, whichever name is chosen, the concept is 'creator'). In order to encourage
the re-use of concepts across different contexts, it is useful to deconstruct
the index into these three components so that the representation and referent
are disentangled from the underlying concept.
Rules
Note: These rules have no protocol significance. For instance,
they define component parts of an index, however there is no expectation that
servers will parse the index names into component parts.
- Basic Rules
- An Index name has one to three components:
- Referent type. Optional. If omitted it should
be inferable from the index or context set definition (and if
it is not, it is unspecified).
- Concept. Required.
- Representation class. Optional. If omitted
it should be inferable from the index or context set definition
(and if it is not, it is unspecified).
- The referent type and representation class are single words, English
nouns. The concept is one or more words, one (and only one) of which
is the base concept, which is an English noun.
- When contemplating the creation of an index name, the creator should
attempt to determine if a suitable index name already exists (in the
same or another context set), and if so, use it rather than creating
a new index name.
- A context set definition whose index names conform to this agreement
should state so.
- Definitions
Every referent type, concept, and representation class should be assigned
a single associated definition. Where practical the definition should
try to follow ISO 11179 part 4:
- be stated in the singular;
- state what the concept is, not only what it is not;
- be stated as a descriptive phrase or sentence(s);
- contain only commonly understood abbreviations;
- be expressed without embedding definitions of other data or underlying
concepts.
- Syntax
- The referent type, if present, is the leftmost part. The concept
follows, and the representation class, if present, is the the rightmost
part. If there is more than one component they are concatenated.
- If the concept has more than one word, they are concatenated with
the base concept the rightmost word.
- "en-US" spelling is to be used.
- The character repertoire is letters [A-Z, a-z] and digits (0-9).
Note: Since CQL Index names are case insensitive, two index names
should be considered equivalent if they are the same when normalized
to all lower case. Two such (equivalent) index names should not
be defined within the same context set. Similarly, index component
parts (referent, concept, representation) are case insensitive and
two equivalent component parts of the same type should not be defined.
For index names that do not conform to this agreement, It is strongly recommended
that the context set definition (for the context set to which the index belongs)
state that its index names do not conform to this agreement.
Examples
- Consider a database where records are metadata about books.
Consider the (hypothetical) index name: abstractLanguageCode
It has the following components:
Referent type: 'abstract',
Concept: 'language'
Representation Class: ' code'
"abstractLanguageCode=de" would find books with an abstract written
in German.
- Suppose instead you want to search on the language of the text rather
than that of the abstract. In that case the index name might be textLanguageCode.
- Suppose that only the text is searchable on language, and only by code.
In that case a suitable index would be dc.language.
- Suppose instead you want to search on the language of the record, rather
than that of (part of) the resource.
Theoretically, an appropriate index name would be: recordLanguageCode.
However (by rule 4 above) such an index name should not be constructed,
instead, the index name rec.languageCode should be used, where
'rec' is the Draft
Record Metadata Context Set version 1.1 associated with the URI: info:srw/cql-context-set/2/rec-1.1/.
For this index name the referent is not included and is implicitly 'record'.
- Suppose you want to search on abstract languages by code (assuming some
code list for abstract languages). The index name might be abstractLanguageCode where
'abstractLanguage' is the concept and 'code' is the representation class
(no referent type). Note that this index name is identical to the name in
example 1. These two index names must not be defined within the same context
set. 'abstract' is the referent type in the first and is part of the concept
in the second. This example is intended to illustrate that there is no reliable
way to parse index names into their component parts; the index name definition
should indicate what the component parts are.
- Consider the index dc.title. The concept is 'title'; there is no explicit
referent type nor representation class.
|