Short Topics
XQuery: Its Relationship to CQLQuery languages for the most part are either: Examples of (a) are W3C's XQuery, SQL, and the Z39.50 type-1 query. Examples of (b) are CCL and Google. Query languages in general do not combine functionality and user-friendliness; CQL is an attempt to combine these two features. More to the point, CQL's goal is to combine the simplicity and intuitiveness of google searching with the expressive power of the Z39.50 query; to allow users to begin with very simple queries and work their way up to arbitrarily complex expressions as necessary. For example, the following (valid) CQL queries are intuitive and need no explanation:
And the following are reasonably but not completely intuitive:
The second set of examples reflect more functionality that those in the first set and are correspondingly more complex, but not disproportionately so. XQuery, on the other hand, is a large and complex specification, which has been in development for a long time (several years) and consists of a number of (12 or so) large documents. It is difficult to comprehend without committing several days to reading the documents. CQL, by contrast, can be understood completely in an hour or so. The XQuery development has been influenced, almost entirely, by two very distinct constituencies: (1) XML-as-document and (2) XML-as-data. The first reflects XML's roots as SGML, while the second reflects a relational database bias. Neither of these constituencies "won"; XQuery, rather than defining different queries for the different models, attempts to meet the needs of both constituencies with a single language. Both XQuery and CQL assume that information is returned as XML. But XQuery goes a step further. It assumes that the information to be queried is (or is representable as) XML; CQL makes no such assumption. Both languages specify a non-xml syntax; XQuery, in addition, defines an alternative XML syntax.* In the XQuery case, this reflects apparent inability to resolve the question of whether an XML query syntax should itself be XML. Though on the surface it seems a good idea, the CQL developers ultimately decided it was not. *(CQL did specify an alternative XML sytax, XCQL, in version 1.0, but abandoned it in 1.1. That is, it abandoned it for purposes of submitting the query. CQL retains the XCQL spec to be used by server to "echo" the query that was submitted.) An example of a simple (non-xml) XQuery query is: let $title := /book/title which is reasonably intuitive, it says "find all elements <title> within element <book> and return these as XML fragments each wrapped in an element <returnedTitle>". This example illustrates some fundamental differences from CQL:
XQuery could be very useful and appropriate for searching, for example, the congressional record, assuming that it is exposed in XML, where the specific schema of the data is well-known. It would also be useful for relational databases. It would not be useful for bibliographic data, record-based databases, or for metasearching across diverse databases; instead, CQL/SRU, will be more appropriate. OpenURL and SRUSRU is sometimes compared with OpenURL. People ask "why isn't OpenURL used for searching, rather than SRU?" OpenURL packages metadata, about a desired resource, along with additional context information, into a URL. SRU packages query parameters, which similarly are often metadata about a desired resource, along with protocol information, into a URL. So there are similarities between OpenURL and SRU. But the comparison is superficial. It's useful to look more closely at the OpenURL model. OpenURL links a user to an appropriate resource. It does this in part by including bibliographic information about the resource. As that information might lead to several resources, context information is also included in the URL, to help select the most appropriate from among those several resources. In a typical OpenURL scenario a user (requester) accesses a server (referrer) on which there is an article (referring entity) which cites a reference (referent). The reference looks like it might be a normal link that the user can click, but it's really an OpenURL -- an HTTP URL, not a URL for a specific resource, but instead, metadata about these context entities (requester, referrer, refering entity, referent). And the base url (i.e. where the url is being sent) isn't the location of the desired resource, instead it is what's known as a resolver -- a server designed to take all this information and determine what resource the user really wants (or is "most appropriate"). Note: There may be an additional step: when the user clicks on the link it might first get a menu of services: full text, abstract, table of contents, reviews, etc. The user selects one and this desired service type is also included in the URL. So SRU and OpenURL serve very different purposes. One selects records based on search criteria, the other selects a single resource, the one deemed "most appropriate", from among a number of potential resources, based on context information. Note also that OpenURL intends to locate a single resource, while SRU finds all resources that meet specified criteria. OpenURL generally returns full text of the resource (or if not full text of the resource, text for some desired service). With SRU, the request can specify the format of the response records, and the response might not include any record, but instead indicate a result count (and the user may subsequently retrieve records from the result set). Thus SRU is an information retrieval protocol. OpenURL is not. On the other hand, OpenURL, clearly, addresses functions that SRU doesn't contemplate. OAI, SRU, and OpenURL: How might these three work together?These three can work together in a complimentary manner. First consider the complimentary roles of OIA and SRU. In the OAI model, a service provider accesses a metadata repository via the OAI protocol, to harvest records from the repository. There is little selectivity available to the service provider, it simply takes the metadata records available, subject to some basic filtering, for example time of creation or sub-repository name. The result is a somewhat random collection of metadata records. The OAI protocol does not address how that database might be searched. That's where SRU would come in. The service provider would interface an SRU server to the database of metadata records for an SRU client to access. In this model, where an SRU server has access to harvested metadata, an OpenURL provider can effectively utilize an SRU client who has access to this server. In the OpenURL model as described above a user accesses an article which cites a reference, which looks like a normal link that the user can click, but it's really an OpenURL, filled with metadata. The system that provides the OpenURL needs access to metadata in order to populate the OpenURL with metadata and also to keep the metadata up-to-date. For example, the system might want to create an OpenURL for a resource for which it has an identifier, or a title; it would search the respository on that identifier or title, thereby obtaining other metadata elements for the resource. SRU: Post Vs. GetThe question "why not POST an SRU request instead of (or as an alternative to) using GET?" was raised, because:
SRW uses POST; currently SRU uses GET, and the suggestion is to also allow SRU via POST. Then we would effectively have three methods for how SRW/U is handled via HTTP:
There are two issues: ResolutionCurrently SRW and SRU messages go to the same base URL where (some toolkits assume) that anything received via POST is SRW, so the message is passed to SOAP, while GET messages go to a different process for SRU processing. In other words the software distinguishes SRW from SRU solely based on HTTP method, POST or GET. By adding this third method, they wouldn't be able to get away with that any longer. One possible solution is to use different addresses for the different methods, and there are a number of suggestions for resolution, for example, Explain can provide a list of methods supported and corresponding addresses, or this could be done via <links> in <databaseInfo>. And it is also suggested that Explain could make this distinction without distinct addresses: You can already say SRW/U meaning that you support SRW and SRU at the same address. With an addition of a 'method' attribute you can say if you support SRU via POST or GET. (The value would be a space separated list. So you could have: <... protocol="SRW/U" method="POST GET">.) Complexity of Adding a Third MethodThe consensus appears to be that the cost of this complexity is worth the gain. Not allowing SRUP would mean that some queries will be impossible without SOAP, and the people affected are likely to just implement it anyway, allowed or not. So, assuming we define a third method, SRUP, the SRU choices for an implementor (leaving aside SRW considerations for the moment) become:
However, simple clients are most likely to continue to use SRU GET, so it is important that choice (3) be disallowed. The best way to do that is to explicitly declare that conformance to SRU requires that GET be implemented (whether POST is implemented or not). This argues in favor of formalizing SRUP, because otherwise there would be no context for such a conformance rule. Opensearch Vs. SRU Parameter NamesOne of the interesting features of openSearch is that the parameter names are not fixed. The parameters defined by the openSearch query spec are (1) the query, (2) number of records desired, (3) offset. There are analogous SRU parameters. In SRU these three parameters have well-defined names. However the openSearch spec does not define names for these parameters, rather it allows an openSearch server to use whatever names it wants. For example, consider these three queries:
In (1) the query is supplied by the parameter with name 'q'. In (2) the query parameter is 's', and in (3), 'searchTerms'. Note also that there are additional parameters beyond the base three, for example 'output' in (2) and 'format' in (3). This works, because openSearch requires that a server provide a so-called openSearchDescription, which is in a real sense analogous to ZeeRex, which "explains" all the parameters. The reasoning for this (as explained by the openSearch developer) is to allow a company to use an existing query format, that is, the same parameters, as long as the base three match up semantically. (And it is interesting to observe, this is working in the real world, based on the idea of self-configuring clients, the same concept as that of ZeeRex.) Here is a sample xml element, <url>, which is included in a description file and server to explain the openSearch parameters accepted: <Url>http://search.athenscounty.lib.oh.us/cgi-bin/koha/opensearch?| Thus "q={searchTerms}" serves to explain that the parameter name 'q' is to be used for the query, etc. Note also that this example defines a local parameter, 'relevanceScale'. Local parameters are not expected to necessarily be supported by the client. SRU and Z39.50The SRU Initiative recognizes the importance of Z39.50 (as currently defined and deployed) for business communication. While SRU focuses on getting information to the user, building on Z39.50 semantics enables the creation of gateways to existing Z39.50 systems. SRU combines several Z39.50 features, most notably, the Search, Present, Sort and Scan Services. Additional features/services may be added later or defined later as new web services. Z39.50 Concepts Retained in SRU
Some SRU Differences from Z39.50
What are the potential advantages of SRW over SRU?The benefits of SRW are: better extension support, authentication, web service features. Federated SearchEric Morgan ResponsesRalph LeVan Matthew Dovey An optimization/improvement: the "Centroid" approach Retrieve the list of terms from an index from each database via scan. For example, say:
Searching for "author=Morgan", there is no point in sending
a request to database C, and probably not much point
sending to A either. This approach reduces the number
of database you need to search for a particular query. (However, it
isn't very good if you are trying to locate particular items, for example
if these were databases of rare |
November 13, 2013 |