requirement:
head:Exceptions to the Inherited Requirements
Unlike the ECHO Dep Generic METS Profile for
Preservation and Digital Repository Interoperability" parent profile, this profile supports two different file packaging mechanisms. 1) All the files
which comprise the package must exists in the same location as the METS document
itself or in subdirectories under the location where the METS document resides (the same as the parent profile).
2) All the files which comprise the package must be contained in one or more ARC
(http://www.archive.org/web/researcher/ArcFileFormat.php) files which are in the
same directory or subdirectories below where the METS document resides. In both
cases these files are referenced via the Flocat element.
Files contained in an ARC file (see below), are not required to be extracted from the ARC file before they are referenced in documents
that conform to this profile.
For the MIMETYPE attribute, if the file is an ARC file the value of "application/octet-stream" must
be used. If an official MIME type is ever registered for ARC files we will use
that value instead.
The OWNERID may be used to identify components of an ARC file (see below).
requirement:
head:Specific Requirements for when the files are contained in an ARC
Multiple ARC files may be referenced.
The file elements representing the ARC files must contain an FLocat element
which points to the ARC file. The FLocat must have a LOCTYPE attribute with a
value of 'URL'. It must also have an xlink:href attribute which contains the URL
which points to the file. The URL should be a relative URI which is relative to
the location of the METS document itself.
Each file element representing an ARC file must have USE attribute with a value
of 'ARC'.
Each file element representing a ARC file must still conform to the previous
requirements for all file elements, including attributes such as SIZE, MIMETYPE,
etc.
Technical metadata pertaining to the web crawl or other process that generated
the ARC, for example, the logs associated with a run of Heretrix, should be
associated with the ARC file via an ADMID attribute attached to the file element
which represent the whole ARC file.
Nested under each of the file elements which represents an ARC should be
subordinate file elements which represent the individual documents contained in
the ARC file which are relevant to the package. For submission information
packages (SIPs) where every document contained in the ARC file is relevant to
the intellectual entity represented by the METS package, the subordinate file
list is optional but recommended. The preservation system to which the package
is being submitted should decompose the ARC file and add file elements for all
of the documents contained in the ARC.
Not every document contained in the ARC file must be listed, but the documents
which are considered to be significant to the representation of the intellectual
entity must be listed. One reason that some documents may be omitted is that the
ARC file may contain insignificant documents which are not considered part of
the representation of the entity, such as HTTP 404 Not Found responses; those
documents should not be listed here unless they are intended to be part of the
intellectual entity represented by the package.
Another case where documents may be omitted is if the ARC file or files contains
documents which cross multiple intellectual entities. In this case only the
documents which are part of the intellectual entity represented by the current
METS file should be listed. The same ARC file may be referenced in different
METS packages with different documents. In general the relationship between the
documents contained in an ARC file and the files which comprise an intellectual
entity is many-to-many: the files comprising a single intellectual entity may be
spread across multiple ARC files, and a single ARC file may contain documents
relevant to multiple intellectual entities. Although this type of situation is
not recommended, it can be accommodated by this profile. However, in complex
cases like this, the relevant subordinate files must always be listed nested
beneath the ARC file element. In general, if no subordinate files are listed
below the file representing the ARC file the assumption is that all documents
contained in the ARC file are required for the representation of the
intellectual entity. Likewise, if subordinate files are listed beneath the ARC
file element, the assumption is that only those files listed are relevant to the
representation of the intellectual entity, even if there are additional
documents contained in the ARC.
The reason that the nested file elements are required by this profile,
especially when the METS file is an AIP, is that if there are descriptive,
technical, or provenance metadata which are tied to specific documents contained
in the ARC file, the nested file elements are what link back to the dmdSec,
techMD, or digiprovMD elements via the nested file elements' DMDID and ADMID
attributes. Because of this it is recommended that a preservation system to
which ARC files are being submitted decompose the ARC file into its constituent
files and manage these files as individual objects.
The nested file elements must have a USE attribute with a value of
'ARC-URL-RECORD'. They must also have an OWNERID attribute which contains the
exact URL-record line corresponding to the document as it occurs in the ARC
file, such as:
http://www.dryswamp.edu:80/index.html
127.10.100.2 19961104142103 text/html 202
http://www.dryswamp.edu:80/index.html
127.10.100.2 19961104142103 text/html 200 fac069150613fe55599cc7fa88aa089d - 209
IA-001102.arc 202
The OWNERID values should be sufficient to identify and extract any document
from the ARC.
Nested file elements which represent documents contained in an ARC file must also
still conform to the previous requirements for all file elements, including
attributes such as SIZE, MIMETYPE, etc.