Divergencies between lakesuperior and FCREPO4

This is a (vastly incomplete) list of discrepancies between the current FCREPO4 implementation and Lakesuperior. More will be added as more clients will use it.

Not yet implemented (but in the plans)

  • Various headers handling (partial)
  • AuthN and WebAC-based authZ
  • Fixity check
  • Blank nodes (at least partly working, but untested)
  • Multiple byte ranges for the Range request header

Potentially breaking changes

The following divergences may lead into incompatibilities with some clients.

ETags

“Weak” ETags for LDP-RSs (i.e. RDF graphs) are not implemented. Given the possible many interpretations of how any kind of checksum for an LDP resource should be calculated (see discussion), and also given the relatively high computation cost necessary to determine whether to send a 304 Not Modified vs. a 200 OK for an LDP-RS request, this feature has been considered impractical to implement with the limited resources available at the moment.

As a consequence, LDP-RS requests will never return a 304 and will never include an ETag header. Clients should not rely on that header for non-binary resources.

That said, calculating RDF chacksums is still an academically interesting topic and may be valuable for practical purposes such as metadata preservation.

Atomicity

FCREPO4 supports batch atomic operations whereas a transaction can be opened and a number of operations (i.e. multiple R/W requests to the repository) can be performed. The operations are persisted in the repository only if and when the transaction is committed.

LAKesuperior only supports atomicity for a single HTTP request. I.e. a single HTTTP request that should result in multiple write operations to the storage layer is only persisted if no exception is thrown. Otherwise, the operation is rolled back in order to prevent resources to be left in an inconsistent state.

Tombstone methods

If a client requests a tombstone resource in FCREPO4 with a method other than DELETE, the server will return 405 Method Not Allowed regardless of whether the tombstone exists or not.

Lakesuperior will return 405 only if the tombstone actually exists, 404 otherwise.

Limit Header

Lakesuperior does not support the Limit header which in FCREPO can be used to limit the number of “child” resources displayed for a container graph. Since this seems to have a mostly cosmetic function in FCREPO to compensate for performance limitations (displaying a page with many thousands of children in the UI can take minutes), and since Lakesuperior already offers options in the Prefer header to not return any children, this option is not implemented.

Web UI

FCREPO4 includes a web UI for simple CRUD operations.

Such a UI is not in the immediate Lakesuperior development plans. However, a basic UI is available for read-only interaction: LDP resource browsing, SPARQL query and other search facilities, and administrative tools. Some of the latter may involve write operations, such as clean-up tasks.

Automatic path segment generation

A POST request without a slug in FCREPO4 results in a pairtree consisting of several intermediate nodes leading to the automatically minted identifier. E.g.

POST /rest

results in /rest/8c/9a/07/4e/8c9a074e-dda3-5256-ea30-eec2dd4fcf61 being created.

The same request in Lakesuperior would create /rest/8c9a074e-dda3-5256-ea30-eec2dd4fcf61 (obviously the identifiers will be different).

This seems to break Hyrax at some point, but might have been fixed. This needs to be verified further.

Allow PUT requests with empty body on existing resources

FCREPO4 returns a 409 Conflict if a PUT request with no payload is sent to an existing resource.

Lakesuperior allows to perform this operation, which would result in deleting all the user-provided properties in that resource.

If the original resource is an LDP-NR, however, the operation will raise a 415 Unsupported Media Type because the resource will be treated as an empty LDP-RS, which cannot replace an existing LDP-NR.

Non-standard client breaking changes

The following changes may be incompatible with clients relying on some FCREPO4 behavior not endorsed by LDP or other specifications.

Pairtrees

FCREPO4 generates “pairtree” resources if a resource is created in a path whose segments are missing. E.g. when creating /a/b/c/d, if /a/b and /a/b/c do not exist, FCREPO4 will create two Pairtree resources. POSTing and PUTting into Pairtrees is not allowed. Also, a containment triple is established between the closest LDPC and the created resource, e.g. if a exists, a </a> ldp:contains </a/b/c/d> triple is created.

Lakesuperior does not employ Pairtrees. In the example above Lakesuperior would create a fully qualified LDPC for each missing segment, which can be POSTed and PUT to. Containment triples are created between each link in the path, i.e. </a> ldp:contains </a/b>, </a/b> ldp:contains </a/b/c> etc. This may potentially break clients relying on the direct containment model.

The rationale behind this change is that Pairtrees are the byproduct of a limitation imposed by Modeshape and introduce complexity in the software stack and confusion for the client. Lakesuperior aligns with the more intuitive UNIX filesystem model, where each segment of a path is a “folder” or container (except for the leaf nodes that can be either folders or files). In any case, clients are discouraged from generating deep paths in Lakesuperior without a specific purpose because these resources create unnecessary data.

Non-mandatory, non-authoritative slug in version POST

FCREPO4 requires a Slug header to POST to fcr:versions to create a new version.

Lakesuperior adheres to the more general FCREPO POST rule and if no slug is provided, an automatic ID is generated instead. The ID is a UUID4.

Note that internally this ID is not called “label” but “uid” since it is treated as a fully qualified identifier. The fcrepo:hasVersionLabel predicate, however ambiguous in this context, will be kept until the adoption of Memento, which will change the retrieval mechanisms.

Another notable difference is that if a POST is issued on the same resource fcr:versions location using a version ID that already exists, Lakesuperior will just mint a random identifier rather than returning an error.

Deprecation track

Lakesuperior offers some “legacy” options to replicate the FCREPO4 behavior, however encourages new development to use a different approach for some types of interaction.

Endpoints

The FCREPO root endpoint is /rest. The Lakesuperior root endpoint is /ldp.

This should not pose a problem if a client does not have rest hard-coded in its code, but in any event, the /rest endpoint is provided for backwards compatibility.

Future implementations of the Fedora API specs may employ a “versioned” endpoint scheme that allows multiple Fedora API versions to be available to the client, e.g. /ldp/fc4 for the current LDP API version, /ldp/fc5 for Fedora version 5.x, etc.

Automatic LDP class assignment

Since Lakesuperior rejects client-provided server-managed triples, and since the LDP types are among them, the LDP container type is inferred from the provided properties: if the ldp:hasMemberRelation and ldp:membershipResource properties are provided, the resource is a Direct Container. If in addition to these the ldp:insertedContentRelation property is present, the resource is an Indirect Container. If any of the first two are missing, the resource is a Container.

Clients are encouraged to omit LDP types in PUT, POST and PATCH requests.

Lenient handling

FCREPO4 requires server-managed triples to be expressly indicated in a PUT request, unless the Prefer header is set to handling=lenient; received="minimal", in which case the RDF payload must not have any server-managed triples.

Lakesuperior works under the assumption that client should never provide server-managed triples. It automatically handles PUT requests sent to existing resources by returning a 412 if any server managed triples are included in the payload. This is the same as setting Prefer to handling=strict, which is the default.

If Prefer is set to handling=lenient, all server-managed triples sent with the payload are ignored.

Clients using the Prefer header to control PUT behavior as advertised by the specs should not notice any difference.

Optional improvements

The following are improvements in performance or usability that can only be taken advantage of if client code is adjusted.

LDP-NR content and metadata

FCREPO4 relies on the /fcr:metadata identifier to retrieve RDF metadata about an LDP-NR. Lakesuperior supports this as a legacy option, but encourages the use of content negotiation to do the same while offering explicit endpoints for RDF and non-RDF content retrieval.

Any request to an LDP-NR with an Accept header set to one of the supported RDF serialization formats will yield the RDF metadata of the resource instead of the binary contents.

The fcr:metadata URI returns the RDF metadata of a LDP-NR.

The fcr:content URI returns the non-RDF content.

The two optionsabove return an HTTP error if requested for a LDP-RS.

“Include” and “Omit” options for children

Lakesuperior offers an additional Prefer header option to exclude all references to child resources (i.e. by removing all the ldp:contains triples) while leaving the other server-managed triples when retrieving a resource:

Prefer: return=representation; [include | omit]="http://fedora.info/definitions/v4/repository#Children"

The default behavior is to include all children URIs.

Soft-delete and purge

NOTE: The implementation of this section is incomplete and debated.

In FCREPO4 a deleted resource leaves a tombstone deleting all traces of the previous resource.

In Lakesuperior, a normal DELETE creates a new version snapshot of the resource and puts a tombstone in its place. The resource versions are still available in the fcr:versions location. The resource can be “resurrected” by issuing a POST to its tombstone. This will result in a 201.

If a tombstone is deleted, the resource and its versions are completely deleted (purged).

Moreover, setting the Prefer:no-tombstone header option on DELETE allows to delete a resource and its versions directly without leaving a tombstone.