Use of nested Schemas for Cross-domain searching and Semantic interoperability


Question from: CIMI Interoperability Testbed


Question:
We are designing an abstract record structure for CIMI to provide different level of semantic interoperability to accommodate various levels of client- awareness, and for purposes of cross-domain searching.

To this aim we are investigating the use of the tagSet-M element schemaIdentifer (1,1) in a retrieval record to indicate the schema in effect.

For purposes of cross-domain searching, the notion of nested schemas may be useful for providing levels of semantic interoperability, including the notion of no governing schema at the most generic level, at which point generic metadata elements (for example, Dublin Core) may be inserted.

The proposed CIMI retrieval record provides three levels of semantic interoperability. It specifies information at the beginning that a client may understand even if the client does not recognize any specific schema, followed by information that a client may understand if it recognizes the Collections schema but not necessarily the CIMI schema, followed by information understandable to a client that recognizes the CIMI schema.

The CIMI abstract record structure specifies:

  1. At the beginning of a CIMI retrieval record there may occur generic metadata (in the form of one or more tagSet-G elements).
  2. Following that, the Collections schema is assumed, and there is information recognizable to a Collections-aware client, including additional metadata elements.
  3. Following this information, the CIMI schema is assumed.

At the generic level of semantic interoperability, a generic Z39.50 client might search databases that include object and collection descriptive records as well as other types of records. By including metadata at the top level of a CIMI retrieval record a generic client may be able to partially, if not fully process these records.

At the next level, a collections-aware (but not CIMI-aware) client who performs a distributed search over multiple collections involving multiple disciplines may locate a CIMI record, and discover that there is a potential object of interest, even though the client is not able to fully process the record.

At the highest level of semantic interoperability, a CIMI-aware client may be able to fully process a CIMI retrieval record.

In general when the schemaIdentifier occurs as (and only as) the very first element, it is clear that it applies to the entire retrieval record. But consider the following questions:

  1. Suppose you want to change schemas in the middle of a retrieval record. How may the schemaIdentifier be used for this purpose? May a schema identifier be inserted within (not at the beginning) of a retrieval record?
  2. Assuming yes, suppose the first occurrence of a schemaIdentifier within a retrieval record is not the first element. Is this legal? If so, what schema is presumed in effect before the schema identifier is encountered? What portion of the retrieval record is affected by the embedded schema identifier?
  3. If a schema identifier does occur, and not as the first element of a record (assuming this is legal), then would those elements preceding the first occurrence of a schema identifier be interpreted outside the context of a specific schema? Does this mean that only elements that are not associated with any specific schema may occur prior to the first occurrence of a schema identifier; i.e. tagSet-G and tagSet-M elements?

Response:
A schema identifier may legally occur anywhere within a retrieval record as long as it is not preceded by siblings (leaf or non-leaf). That is, it must be either the first occurring element in the record or the first occurring element subordinate to its parent. The identified schema governs all elements at the same level as, or subordinate to, the schema identifier, unless superseded by a subordinate schema identifier. For example, consider the following retrieval record: Schema A (identified by element 1) governs elements 2, 3, 4, 5, 16, and 17
Schema B (identified by element 6) governs elements 7, 8, 9, 10, 14, and 15
Schema C (identified by element 11) governs elements 12 and 13

If there is no schema identifier at the top of a retrieval record, then two cases should be considered:

  1. a schema identifier occurs later within the record;
  2. there is no schema identifier at all within the record.
In the first case, the elements that occur prior to the first schema identifier should be assumed to occur outside the context of any specific schema, so these should include tagSet-G and TagSet-M elements only. Thus as a rule of thumb, whenever a retrieval record includes a schema identifier not as the first element, then it should also include a schema identifier as the first element, unless it intends that no schema be in effect prior to encountering the schema identifier.

In the second case, it is possible that there is a known schema in effect, either because a schema identifier was included in the retrieval request, or because there is a prior understanding between client and server about what schema is in effect. As a rule of thumb, if the server is not certain that there is a prior understanding, or if the schema in effect is not the schema requested, then the server should insert the schema identifier at the beginning of the record.


Status: Approved (8/97)
Library of Congress
(10/23/97)