The CQL Context Set (version 1.1)
see also version
1.2
The CQL context set defines a set of indexes, relations and relation
modifiers. The indexes supplied are 'utility' indexes which do not directly
reference any data. These utility indexes are for instances when CQL is
required to express a concept not directly related to the records.
Historical note: In CQL version 1.0, this was the 'srw' index
set. Implementers may wish to accept the 'srw' as a reserved name for
the identifier 'http://www.loc.gov/zing/cql/srw-indexes/v1.0/' with the
same semantics as below. srw.resultSetName has been renamed to cql.resultSetId
for consistency.
- The reserved name for this context set is: cql
- The identifier for this context set is: info:srw/cql-context-set/1/cql-v1.1
Sections: Indexes
| Relations | Relation Modifiers
| Relation Qualifiers
| Boolean Modifiers
Indexes
- resultSetId
A search clause may be a result set id. This is a special case, where
the index and relation are expressed as "cql.resultSetId =" and the
term is the result set id returned by the server in the 'resultSetId'
parameter of the response. It may be used by itself in a query to refer
to an existing result set from which records are desired. It may also
be used in conjunction with other resultSetId clauses or other indexes,
combined by boolean operators. The semantics of resultSetId with relations
other than "=" is undefined.
- serverChoice
This is the default when the index and relation is omitted from a search
clause. 'cql.serverChoice' means that the server will choose an index
for the given term. The relation used is 'scr', hence 'cql.serverChoice
scr "term"' is an equivalent search clause to '"term"'.
- anywhere
This means "search all indexes from all context sets you know". (By
contrast, cql.serverChoice means essentially "search any index -- your
choice -- from any context set you know".)
- allRecords
A special index which matches every record available.
Every record is matched no matter what values are provided for the relation
and term, but the recommended syntax is: cql.allRecords = 1.
Relations
Implicit Relations
These relations are defined as such in the grammar of CQL. The cql context
set only defines their meaning, rather than their existence.
- <, >, <=,
and >= retain their regular meanings as relations
pertaining to ordered terms
- = is used:
- For word adjacency, when the term is a list of words. That is
to say that the words appear in that order with no others intervening.
- Otherwise, for exact equality of value.
- <> is 'not equal to'.
Default Relations
These relations are defined as being widely useful as part of a default
context set.
- scr is used to mean "server choice relation". It
is used when the client wishes the server to choose the most appropriate
relation for the index or term. It is assumed when relation is omitted.
- exact is used for exact string matching, when the
term is a character string. =/cql.string is synonymous.
- all and any may be used when the
term contains multiple items to indicate "all of these items" or "any
of these items". These queries could be expressed using boolean AND
and OR respectively. These relations have an implicit relation modifier
of 'cql.word'.
- within may be used with a search term that has multiple
dimensions. It matches if the database's term falls completely within
the range, area or volume described by the search term. For example:
dc.date within "2002 2003"
- encloses may be used when the index's data has multiple
dimensions. It matches if the database's term fully encloses the search
term. For example: xxx.dateRange encloses 2002
Relation Modifiers
Term Functions
These relation modifiers request that the server perform some algorithm
on each item within the term before processing. If named algorithms are
required, then further context sets should define relation modifiers for
these.
- stem
The server should apply a stemming algorithm to the words within the
term. For example such that computing and computer both match the stem
of 'compute'.
- relevant
The server should use a relevancy algorithm for determining matches
and the order of the result set.
- phonetic
The server should use a phonetic algorithm for determining words which
sound like the term.
- fuzzy
The server should be liberal in what it counts as a match. The exact
details of this are left up to the server, but might include permutations
of character order, off-by-one for numerical terms and so forth.
Relation Qualifiers
These modifiers qualify the relation to more precisely determine its
semantics.
- partial
When used with within or encloses, there may be some section which extends
without the term. This permits for the database term to be partially
enclosed, or fall partially within the search term.
- Term Format
These relation modifiers describe the format or structure of the term
in some fashion.
- word
The term should be broken into words, according to the server's definition
of a 'word'
- string
The term is a single item, and should not be broken up.
- isoDate
Each item within the term conforms to the ISO 8601 specification for
expressing dates.
- number
Each item within the term is a number.
- uri
Each item within the term is a URI.
- masked (default modifier)
The following masking rules and special characters apply for search
terms, unless overridden in a profile via a relation modifier. To explicitly
request this functionality, add 'cql.masked' as a relation modifier.
- A single asterisk (*) is used to mask zero or more characters.
- A single question mark (?) is used to mask a single character,
thus N consecutive question-marks means mask N characters.
- Carat/hat (^) is used as an anchor character for terms that are
word lists, that is, where the relation is 'all' or 'any', or '='
when used for word adjacency. It may not be used to anchor a string,
that is, when relation is 'exact' (string matches are, by default,
anchored). It may occur at the beginning or end of a word (with
no intervening space) to mean right or left anchored."^" has no
special meaning when it occurs within a word (not at the beginning
or end) or string but must be escaped nevertheless.
- Backslash (\) is used to escape '*', '?', quote (") and '^' ,
as well as itself. Backslash not followed immediately by one of
these characters is an error.
See masking examples below.
- unmasked
Do not apply masking rules.
- oid
The term is an ISO object identifier, dot-separated format. Example
'zeerex.set exact/cql.oid "1.2.840.10003.3.1"'
Masking examples:
- dc.title = c*t (matches cat and coast etc.)
dc.title = "*fish food*" (matches unanchored 'fish food')
- dc.title = c?t (matches cat and cot, not
coast or ct)
" ?" (matches any single character)
- dc.title = "^cat in the hat" (matches 'cat
in the hat' where it is at the beginning of the field)
dc.title any "^cat ^dog eats rat" (matches 'cat eats rat', 'dog eats
cat', 'cat loves bat', but not 'bat loves cat')
- dc.title = "\"Of Couse\" she said"
dc.identifier exact "\\\"\^\*\?andSomeMoreCharacters"
Boolean Modifiers
The CQL context set defines four boolean modifiers, which are only used
with the prox boolean operator.
- distance
The distance that the two terms should be separated by.
Takes the form:
distance [relation] [value]
where relation is one of: "<", ">" ,"<=" ,">=" ,"=" , "<>";
default "<="
and value is a non-negative integer.
e.g. "distance<2"
default: 1 for word, zero otherwise
- unit
The type of unit for the distance.
Takes the form:
unit=[value]
where value is one of: 'paragraph', 'sentence', 'word' and 'element',
or a value from another context set.
e.g. "unit=sentence"
default "word"
- ordered
The order of the two terms must be as per the query.
- unordered
The order of the two terms is unimportant. This is the default.
|