Proposal to RDF Data Shapes WG
SHACL provides structural constraints for RDF graphs. SHACL constraints are grouped into "shapes", which may also be referenced by constraints in other shapes. These constraints describe the triples connecting certain nodes in the graph. SHACL can constrain the number of triples with a particular predicate and the permitted object datatype or object terms, require that the subject or object match some shape or lexical and datatype conditions.
This document uses the following labels for terms in the RDF abstract syntax:
Iri
- RDF IRIBlank
- RDF blank nodeLit
- RDF literalThe abstract syntax of shape expression schemas (or simply schemas) is given below. For now we use a simplified definition of one non-terminal, namely ShapeConstr, which is indicated by the temporary rule ShapeConstrTemp. We make this simplification in order to make defining the semantics easier, and we give the complete definition in Section sec: complex shape constraints.
A Schema is composed of at least one rule (Rule). Every rule associates with a label (ShapeLabel), a shape definition (ShapeDefinition), and possibly a number of additional conditions (ExtensionCondition) defined using an extension mechanism.
A shape definition is either a closed shape (ClosedShape), or an open shape (OpenShape). Both closed and open shapes are defined by a shape expression (ShapeExpr). Open shapes can have associated set of included properties (InclPropSet): properties of which arbitrary extra occurances are permitted. Closed and open shapes will be distinguished thanks the keywords close and open, respectively.
A shape expression is either the empty shape (EmptyShape) represented by the keyword emptyshape, or a triple constraint (TripleConstraint) or an inverse triple constraint (InverseTripleConstraint) followed by a cardinality constraint (Cardinality), or a negated version of the two latter (
Triple constraints are used to specify constraints to be satisfied by the triples having the focus node as subject, and the associated cardinality specifies how many triples satisfying the triple constraint are required.
Inverse triple constraints play a similar role, but define constraints to be satisfied by the triples having the focus node as object.
We will write a::C for the triple constraint with IRI a, and with value or shape constraint C.
Cardinalities will be written as an interval in square brackets, and which maximum bound can be the special value unbound.
In the examples, we omit writing the cardinality when the minimal and the maximal cardinality are both equal to one.
That is, we write simply a::C for a::C[1;1].
Negated triple and inverse triple constraints are preceded by an exclamation mark (
A triple constraint can constraint the object of a triple in two different ways.
It either requires for the object to have some particular value (value constraint), or it requires for the object node to satisfy a shape constraint.
Inverse triple constraints are preceded by the
A value constraint can be specified in three different ways: as a set of concrete values, that can be IRI or literals; or as a literal data possibly XSD facet restriction attached to it; or as a kind of the node, among IRI, Blank, literal or non literal.
A shape constraint requires for the node type to satisfy one or more shapes. A disjunctive shape constraint requires for the object node to satisfy at least one among the enumerated shapes. A conjunctive shape constraint requires for the object node to satisfy all of the enumerated shapes. Additionally, a shape constraint can be negated, when preceded by an exclamation mark. This negates the required types, where negation is the usual logical negation; for instance, the negation of a disjunctive shape constraint requires for the object node to satisfy none of the enumerated shapes.
Complex shape expressions can be built thanks to the four operators: some-of, one-of, grouping, and repetition. A some-of shape (SomeOfShape) requires for one of the sub-expressions to be satisfied, but does not forbid for more of the sub-expressions to be satisfied. A one-of sape (OneOfShape) requires that exactly one of the sub-expressions is satisfied. A group shape (GroupShape) requires for the neighbourhood of the focus node to be split in as many sets of triples as there are sub-expressions, and every such set of triples must satisfy the constraint given by the corresponding sub-expression. A repetition shape (RepetitionShape) requires for the sub-shape to be repeted a number of times as specified by the cardinality constraint.
Finally, an extension mechanism allows to attach additional constraints to be satisfied by the nodes of given shape.
Each such condition can be written in some extension language (ExtLangName), and the actual constraint is a Boolean function definition in the corresponding language with true
asserting that the focus node does not meet the constraints in ExtDefinition.
We require an obvious criterion that every shape label that appears in the schema, appears in the left hand side of exactly one rule (that is, all shape labels are defined).
The SHACL abstract syntax above can be represented in an RDF graph. RDF graphs are subject to the constraints in the abstract syntax above, e.g. that a triple constraint may have at most one term constraint.
SHACL triple constraints
can be parsed with a SPARQL query:
PREFIX sh:<http://www.w3.org/ns/shacl#> PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?entry ?isShape ?choice ?group (IF(Bound(?property), # compile TripleConstraint CONCAT("ShapeExpr(TripleConstraint(", "IRI(", STR(?predicate), "), ", IF(Bound(?valueType), CONCAT("ValueType(IRI(",STR(?valueType),"))"), # valueType IF(Bound(?nodeKind), CONCAT("NodeKind(IRI(",STR(?nodeKind),"))"), # nodeKind IF(Bound(?shapeLabel), CONCAT("ShapeLabel(IRI(",STR(?shapeLabel),"))"), # valueShape CONCAT("ValueSet(",GROUP_CONCAT(CONCAT( # allowedValue IF(IsLiteral(?allowedValue), "Literal", "IRI"), # IRIs and Literals "(", STR(?allowedValue), ")") ),")") ))), "))[",if(Bound(?min1), STR(?min1), "1"),",",if(Bound(?max1), STR(?max1), "INF"),"]"), # cardinality "") AS ?TripleConstraint) where { { ?entry sh:property ?property . ?property sh:predicate ?predicate ; OPTIONAL { ?property sh:minCount ?min1 } OPTIONAL { ?property sh:maxCount ?max1 } OPTIONAL { ?property sh:valueType ?valueType } OPTIONAL { ?property sh:nodeKind ?nodeKind } OPTIONAL { ?property sh:valueShape ?shapeLabel } OPTIONAL { ?property sh:allowedValue ?allowedValue FILTER (IsIRI(?allowedValue) || IsLiteral(?allowedValue)) } } UNION { ?entry sh:choice ?choice } UNION { ?entry sh:propertyGroup ?group } OPTIONAL { ?entry a sh:Shape BIND(true AS ?isShape) } } GROUP BY ?entry ?isShape ?property ?predicate ?choice ?group ?min1 ?max1 ?valueType ?nodeKind ?shapeLabel
This produces a hierarchy table with five columns: entry
, isShape
, choice
, group
, TripleConstraint
. The abstract syntax is built in two steps:
entry
to list of tuples of isShape
, choice
, group
, TripleConstraint
isShape
is true, compose a Rule(entry, GroupShape())
.embed(entry, collection)
function takes an entry and a GroupShape or DisjuntiveShape:collection
and invoke embed
with choice and the DisjuntiveShape.collection
and invoke embed
with group and the DisjuntiveShape.collection
.RDF node types are identified by the following IRIs:
RDF node type | SHACL identifier |
---|---|
IRI | sh:IRI |
Literal | sh:Literal |
Blank Node | sh:BNode |
The following example represents a shape my:UserShape
composed of an group shape with two conjuncts:
# shapes (Turtle) my:UserShape a sh:Shape ; sh:choice [ sh:property [ sh:predicate foaf:name ; sh:valueType xsd:string ; sh:minCount 1 ; sh:maxCount 1 ] ; sh:property [ sh:predicate foaf:givenName ; sh:valueType xsd:string ; sh:minCount 1 ] ; ] ; sh:property [ sh:predicate foaf:mbox ; sh:nodeType sh:IRI ; sh:minCount 1 ] .
SHACL defines two predicates, sh:nodeShape and sh:classShape. The former asserts that a particular node in some graph conforms to a specific shape. The latter asserts that every node of some type conforms to a specific shape. It is expected that different communities will develop many more associations, much as the WSDL community created an association between input and output documents and an XML schema which described them.
The sh:classShape predicate describes a way to associate shapes with classes. It is currently unclear what is implied by attaching shape properties (e.g. sh:property) directly to a class e.g.:
clinic1234:CompletePatientRecord a owl:Class ; sh:property [ sh:predicate clinic1234:phone ; sh:valueType xsd:string ; sh:minCount 1 ; sh:maxCount 1 ] .
It's unclear whether an structure associating nodes returned from a SPARQL query would constitued a global constraint, e.g.
[ sd:endpoint <http://www.example/sparql/> ; sd:defaultDataset [ sd:defaultGraph [ sd:Graph [ sh:query """SELECT ?s { ... }""" ; sh:hasShape ex:IssueShape ] ] ] ] .
We start with few preliminary definitions and notations.
the set of shape labels that appear in the schema |
|
the shape expression that is in the definition of the shape label T in the schema |
|
incl(T, S) | the set of included properties associated with the definition of the shape label T in S. Note that if T is a closed shape, then incl(T, S) is empty. |
properties(Expr) | the set of properties that appear in some triple constraint in the shape expression Expr |
inv-properties(Expr) | the set of properties that appear in some inverse triple constraint in the shape expression Expr |
the shapes dependency graph of S, is the directed graph which set of nodes is shapes(S), and that has an edge from T1 to T2 iff the shape label T2 is in expr(T1, S) | |
the sub-graph of |
|
the set of negated shape labels in the schema
|
|
a negated shape label, that is, denotes that |
|
the set of allowed values for a value constraint |
|
for a schema |
Intuitively,
A shape expression schema
The semantics of shape expression schemas is sound only for well-defined schemas. Therefore, from now on, we consider only well defined schemas.
Negated triple and inverse triple constraints are introduced as syntactic facility, their semantics being defined using their non negated versions and zero cardinality.
More precisely, for every triple or inverse triple constraint
In order to handle triple constraints and inverse triple constraints, the triples of a graph will be labeled depending on whether they have the focus node as subject, or as object. Concretely a labeled triple is either an outgoing triple of the form
We say that an outgoing triple (out, n, p, u) matches a triple constraint a::C iff
We say that an incoming triple (inc, u, p, n) matches an inverse triple constraint ^a::C iff p = a.
The following definition introduces the notion of satisfiability of a shape constraint by a set of triples. Such satisfiability is going to be used for checking that the neighborhood of a node satisfies locally the constraints defined by a shape expression, without taking into account whether the shapes required by the triple constraints and inverse triple constraints are satisfied.
Let Neigh be a set of (labeled) triples, and let Expr be a shape expression (as defined by ShapeExpr). We say that Neigh satisfies Expr iff:
Note that the conditions for some-of and one-of shapes are identical. The distinction between both will be made by taking into account also the non-local, shape constraints.
The above definition can be written using the following set of inference rules. We denote Neigh |- Expr the fact that Neigh satisfies Expr.
If a set of triples Neigh satisfies a shape expression Expr, then one can construct (at least one) proof tree which root is Neigh |- Expr, using the above induction rules.
Given such proof tree, it can be shown that every outgoing triple (out, n, p, u) in Neigh appears in the conclusion of exactly one application of rule-triple-constraint.
Similarly, every incoming triple (out, u, p, n) in Neigh appears in the conclusion of exactly one application of rule-inverse-triple-constraint.
For every outgoing, resp. incoming triple (x, n, p, u) in Neigh, let wm(x, n, p, u) be the triple constraint p::C, resp. the inverse triple constraint
For an RDF graph G and a node n in G, the outgoing neighbourhood of n in G is the set of labeled triples out(G,n) = (out, n, p, u) s.t. (n, p, u) is a triple that belongs to the graph G, and the incoming neighbourhood of n in G is the set of labeled triples inc(G, n) = (inc,u, p, n) s.t. (u, p, n) is a triple that belongs to the graph G.
On the implementation level, extension conditions are to be handled by a plugin mechanism, in which the validation procedure delegates checking of the extension condition to a registered plugin. The result of evaluating the extension condition can be true: the extension condition is satisfied, or false: the extension condition is not satisfied, or error: there was an error during the execution, or undefined: the evaluation procedure didn't find the appropriate plugin. On the semantics level, we suppose that for every extension language lang, there exists an oracle function flang that takes as parameters an RDF graph, an IRI corresponding to the focus node, and a string corresponding to the extension condition, and returns as result one of true, false, error, and undefined. For the unsupported extension languages (the result is undefined), the default behaviour is to consider that the constraint is satisfied; this however can be parametrized.
Fix a schema
A typing of
For a typing
For a typing
A typing is called valid typing of G by S if for all node n in G,
We now give a more intuitive explanation of the above definition.
The fact that
The set
Now, passing into review all the conditions for a valid typing. Intuitively, a valid typing will associate the shape
Consider a mapping requiredshapes that associates one or several shape labels with some of the nodes of a graph G. This association mapping is supposed to be constructed by one of the association mechanisms described in Section Associating data with shapes.
A valid typing w.r.t. required shapes requiredshapes is a valid typing t such that for all node n, requiredshapes(n) ∈ t(n).
A stem is an IRI ending with a '~', and representing the (infinite) set of IRI that share the same prefix.
For instance,
The two former situations already correspond to sets (of properties, or of values), so stems are already handled by the semantics. In what follows, we explain how to handle stems that appear as properties in triple constraints (it is similar for inverse triple constraints, so we omit them here). More precisely, we show how instead of single properties, triple constraints can be defined on top of sets of properties.
Consider the following modification of the abstract syntax for the TripleConstraint rule.
Such set of IRI can be defined by a stem, or by any other means (e.g. enumeration of the elements of the set, or regular expression, etc.). We then modify the following definitions.
[Triple matches constraint] An outgoing triple
[Typing, valid typing]
All the other definitions remain unchanged. This allows to handle sets of properties in triple constraints.