ZIG Meeting
12/7/2000
Library of Congress
Notes provided primarily by Les Wibberley.
Querying XML Documents
- Paul Cotton
- formerly worked for Fulcrum, and worked on Z39.50 awhile ago
- also worked on SQL standards
- will make the slides available
- QML Query history and QL '98
- early 1998 - roll your own query language was the feeling
- XSL working Group
- XSLT needed syntax to select nodes
- XML Linking working group
- XPointer needed a syntax to select a location
- Feb 1999 joint meeting
- 90% overlap of needs and syntax
- resulted in XPath
- W3c recommendation XSLT
- common subroutine used by both standards efforts
- this was the start
- early query facilities for sgml
- academic research into semi-structured data and its operations
- XQL - see http://metalab.unc.edu/xql
- X ML-QL August 1998
- Query Languages workshop 98
- W3C XML Query WG
- July 1999 WG proposed as part of xml re-chartering activity
- Sept. 1999 WG chartered
- 8 face to face meetings, and 45 teleconferences
- close working relationship with other w3c working groups (schema, XSL,
118N)
- close relationship between xml schema's and xml query WG
- XML Query WG Goals
- The goal of the xml query WG is to produce a data model for xml documents,
a set of query operators on that data model, and a query language based
on these query operators
- status of w3c xml query WG
- jan 2000 requirements working draft
- may 2000 xml query data model
- may 2000 feedback on schema last call - this WG made majority of comments
- august 2000 revised requirements working draft with use cases (sample
documents with queries written in english, and answers in xml)
- dec 2000 xml query algebra working draft
- future public working drafts every three months
- proposed recommendations
- XML query requirements
- usage scenarios
- general requirements
- non procedural query language
- xml syntax for query language but also a readable syntax
- protocol independent
- standard error conditions
- future support for updates
- query data model
- built on xml Infoset and PSV- Post ... Value
- Namespace aware - using xml Namespace
- support for xml schema data types
- support for inter- and intra- document references
- query functionality
- operators on all data types
- text operators across element boundaries
- support for hierarchy and sequence
- ability to combine data from different locations
- aggregation and sorting
- combination of operators including queries as operands
- support for null values
- structural preservations
- identity preservation
- operations on names
- operations on schema's
- extensibility
- closure
- xml query data model draft
- defines info available to a query processor
- infoset plus following
- support for xml schema data types
- support for document collections
- support for references
- node-labelled tree constructor model with node identity
- mapping from Infoset to Query Data Model defined in Annex A
- http://www.w3.org/TR/query-datamodel/
- QML query algebra working draft
- defines operations on query data model
- simple principles, easy to use
- firm mathematical foundation
- many issues still open
- references
- unordered data
- algebra subset of syntax?
- http://www.w3.org/TR/query-....
- Data model and types
- intro to the query algebra
- data model is essentially an alternative syntax of xml
- example of xml schema and corresponding algebra
- schema deals with inbound schema; query worries about both inbound and
outbound schema
- query language will be very strongly typed (indicate integers, etc.)
- each answer specifies the data type
- a lot of this query languages heavily leverages the XPath conventions
and semantics
- language includes Joins and Selections, etc.
- regrouping can eliminate duplicates
- small set of operators and types used to build the query language
- how many people participate
- chair of the w3c working group
- participants in good standing must attend most meetings, or are removed
- 20-25 on teleconferences
- 30-35 face to face
- w3c members have a prime and secondary rep; only one present
- generally about 10 people do most of the work on a w3c working group
- signing up for w3c working group commits 20% of your time
- cotton doesn't represent microsoft on the WG
- microsoft has two other strong reps involved
- people can join the WG at any time - this is a challenge
- this presentation was given at xml 2000 yesterday
- wrapup
- Mark Needleman
- xml QL
- query language only
- formal data model and algebra
- may have multiple syntaxes, but one language
- protocol independent
- may embed some of the functionality of Z39.50 directly into the language
- Z39.50
- more than just a query language
- supports multiple types of query
- provides session management, result set management, etc.
- xml QL vs z39.50
- ql-sml bases
- Z39.50 ASN ber based
- QL protocol dependent
- Z39.50 very much protocol dependent
- all functionality expressed in the query lang
- z39.50 provides supporting info retrieval beyond the query
- QL - a de facto standard?
- Z39.50 an official standard
- possible futures
- add as one more query type in Z39.50?
- abandon Z39.50 and just use functionality and model of QL
- combination
- Z39.50 in future
- retain concepts and mode of info retrieval provided by current protocol
expressed using xml QL but run over a more modern transport like http
and or xml protocol
- Questions
- how does xml-ql relate to Z39.50
- not yet clear how people implement the query language
- first version will specify syntax which can be used in many environments
- current model in the marketplace today is where xml is exported from
repositories; now looking for query language to go with it
- right now XPath is being used as a query language (including microsoft);
generates query; view database as xml document
- will move the QL up front to replace XPath
- easiest route for Z is to provide access to XML resources by adding
the new Query language as a new Query Type, until more is known about
the impact of this query language on the industry
- xml query language does not require concrete xml on the other end;
could be a virtual xml view of existing data repositories
- may map xml query language onto underlying data at the server end
- xml query group includes representatives from all the database producers,
etc., who want to ensure this will work for them
- what is the relationship between xml query and xml protocol?
- e.g., xml query over SOAP will be similar to Z39.50, but built on
w3c standards family
- Z implementors mapped Z39.50 onto legacy systems
- we have found the need to negotiate features to map to real implementations
- foresee mechanism to negotiate subset of query language/functionality
in the xml - QL
- A: this is the conformance aspect, which is addressed later, not earlier,
so query group has not worked on it yet
- venture capital companies are asking the standards group about this,
to invest in companies planning to implement xml-ql
- there will be a huge momentum behind this work
- could define a Z39.50 search service that could run over SOAP, and
negotiate it (similar to business negotiation)
- the query language is only one step over the larger picture
- how does Z39.50 model some of the emerging needs
- issue of what is built into the model vs what is extensible
- convinced that xml QL will need an extensibility mechanism that allows
definition of extended functions
- you can't define all of the functionality of the standard - this will
probably be used to handle full text searching as extension to standard
- the server could recognize or not recognize extended functions; based
on Namespace, for example