Core SHACL Semantics

Abstract Syntax

RDF abstract syntax

This document uses the following labels for terms in the RDF abstract syntax:

Iri - RDF IRI
Blank - RDF blank node
Lit - RDF literal

Simple shape expression schemas

The abstract syntax of shape expression schemas (or simply schemas) is given below. For now we use a simplified definition of one non-terminal, namely ShapeConstr, which is indicated by the temporary rule ShapeConstrTemp. We make this simplification in order to make defining the semantics easier, and we give the complete definition in Section sec: complex shape constraints.

Schema ::= Rule+
Rule ::= ShapeLabel ShapeDefinition ExtensionCondition*
ShapeLabel ::= an identifier

A Schema is composed of at least one rule (Rule). Every rule associates with a label (ShapeLabel), a shape definition (ShapeDefinition), and possibly a number of additional conditions (ExtensionCondition) defined using an extension mechanism.

ShapeDefinition ::= ClosedShape | OpenShape
ClosedShape ::= 'close' ShapeExpr
OpenShape ::= 'open' InclPropSet? ShapeExpr
InclPropSet ::= PropertiesSet
PropertiesSet ::= set of IRI

A shape definition is either a closed shape (ClosedShape), or an open shape (OpenShape). Both closed and open shapes are defined by a shape expression (ShapeExpr). Open shapes can have associated set of included properties (InclPropSet): properties of which arbitrary extra occurances are permitted. Closed and open shapes will be distinguished thanks the keywords close and open, respectively.

A shape expression is either the empty shape (EmptyShape) represented by the keyword emptyshape, or a triple constraint (TripleConstraint) or an inverse triple constraint (InverseTripleConstraint) followed by a cardinality constraint (Cardinality), or a negated version of the two latter (NegatedTripleConstraint, NegatedInverseTripleConstraint), or a some-of shape (SomeOfShape), or a one-of shape (OneOfShape), or a grouping shape (GroupShape), or a repetition shape (RepetitionShape).

TripleConstraint ::= IRI ValueConstr | IRI ShapeConstr
InverseTripleConstraint ::= '^' IRI ShapeConstr
Cardinality ::= '[' MinCardinality ';' MaxCardinality ']'
MinCardinality ::= a natural number
MaxCardinality ::= a natural number | 'unbound'
NegatedTripleConstraint ::= '!' TripleConstraint
NegatedInverseTripleConstraint ::= '!' InverseTripleConstraint

Triple constraints are used to specify constraints to be satisfied by the triples having the focus node as subject, and the associated cardinality specifies how many triples satisfying the triple constraint are required. Inverse triple constraints play a similar role, but define constraints to be satisfied by the triples having the focus node as object. We will write a::C for the triple constraint with IRI a, and with value or shape constraint C. Cardinalities will be written as an interval in square brackets, and which maximum bound can be the special value unbound. In the examples, we omit writing the cardinality when the minimal and the maximal cardinality are both equal to one. That is, we write simply a::C for a::C[1;1]. Negated triple and inverse triple constraints are preceded by an exclamation mark (!).

A triple constraint can constraint the object of a triple in two different ways. It either requires for the object to have some particular value (value constraint), or it requires for the object node to satisfy a shape constraint. Inverse triple constraints are preceded by the ^ symbol, and allow only shape constraints for the subject node.

A value constraint can be specified in three different ways: as a set of concrete values, that can be IRI or literals; or as a literal data possibly XSD facet restriction attached to it; or as a kind of the node, among IRI, Blank, literal or non literal.

ShapeConstr ::= ('!')? DisjShapeConstr | ConjShapeConstraint
DisjShapeConstr ::= ShapeLabel ('or' ShapeLabel)*
ConjShapeConstraint ::= ShapeLabel ('and' ShapeLabel)*

A shape constraint requires for the node type to satisfy one or more shapes. A disjunctive shape constraint requires for the object node to satisfy at least one among the enumerated shapes. A conjunctive shape constraint requires for the object node to satisfy all of the enumerated shapes. Additionally, a shape constraint can be negated, when preceded by an exclamation mark. This negates the required types, where negation is the usual logical negation; for instance, the negation of a disjunctive shape constraint requires for the object node to satisfy none of the enumerated shapes.

SomeOfShape ::= ShapeExpr ('|' ShapeExpr)*
OneOfShape ::= ShapeExpr ('•' ShapeExpr)*
GroupShape ::= ShapeExpr (',' ShapeExpr)*
RepetitionShape ::= ShapeExpr Cardinality

Complex shape expressions can be built thanks to the four operators: some-of, one-of, grouping, and repetition. A some-of shape (SomeOfShape) requires for one of the sub-expressions to be satisfied, but does not forbid for more of the sub-expressions to be satisfied. A one-of sape (OneOfShape) requires that exactly one of the sub-expressions is satisfied. A group shape (GroupShape) requires for the neighbourhood of the focus node to be split in as many sets of triples as there are sub-expressions, and every such set of triples must satisfy the constraint given by the corresponding sub-expression. A repetition shape (RepetitionShape) requires for the sub-shape to be repeted a number of times as specified by the cardinality constraint.

ExtensionCondition ::= ExtLangName ExtDefinition
ExtLangName ::= an identifier
ExtDefinition ::= a string

Finally, an extension mechanism allows to attach additional constraints to be satisfied by the nodes of given shape. Each such condition can be written in some extension language (ExtLangName), and the actual constraint is a Boolean function definition in the corresponding language with true asserting that the focus node does not meet the constraints in ExtDefinition.

We require an obvious criterion that every shape label that appears in the schema, appears in the left hand side of exactly one rule (that is, all shape labels are defined).

RDF Vocabulary

The SHACL abstract syntax above can be represented in an RDF graph. RDF graphs are subject to the constraints in the abstract syntax above, e.g. that a triple constraint may have at most one term constraint.

SHACL triple constraints can be parsed with a SPARQL query:

	  PREFIX sh:<http://www.w3.org/ns/shacl#>
	  PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
	  PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
	  
	  SELECT ?entry ?isShape ?choice ?group 
	  (IF(Bound(?property),                      # compile TripleConstraint
	      CONCAT("ShapeExpr(TripleConstraint(",
	  "IRI(", STR(?predicate), "), ",
	  IF(Bound(?valueType),      CONCAT("ValueType(IRI(",STR(?valueType),"))"),   # valueType
	          IF(Bound(?nodeKind),     CONCAT("NodeKind(IRI(",STR(?nodeKind),"))"),     # nodeKind
	            IF(Bound(?shapeLabel), CONCAT("ShapeLabel(IRI(",STR(?shapeLabel),"))"), # valueShape
	                                   CONCAT("ValueSet(",GROUP_CONCAT(CONCAT(          # allowedValue
	                                       IF(IsLiteral(?allowedValue), "Literal", "IRI"), # IRIs and Literals
	                                             "(", STR(?allowedValue), ")")
          ),")")
	  ))),
	  "))[",if(Bound(?min1), STR(?min1), "1"),",",if(Bound(?max1), STR(?max1), "INF"),"]"), # cardinality
	      "") AS ?TripleConstraint)
	  where { 
	  {
	  ?entry sh:property ?property .
	  ?property sh:predicate ?predicate ;
	  OPTIONAL { ?property sh:minCount ?min1 }
	  OPTIONAL { ?property sh:maxCount ?max1 }
	  OPTIONAL { ?property sh:valueType ?valueType }
	  OPTIONAL { ?property sh:nodeKind ?nodeKind }
	  OPTIONAL { ?property sh:valueShape ?shapeLabel }
	  OPTIONAL { ?property sh:allowedValue ?allowedValue
          FILTER (IsIRI(?allowedValue) || IsLiteral(?allowedValue)) }
	  } UNION {
	  ?entry sh:choice ?choice
	  } UNION {
	  ?entry sh:propertyGroup ?group
	  }
	  OPTIONAL { ?entry a sh:Shape
	  BIND(true AS ?isShape)
	  }
	  } GROUP BY ?entry ?isShape ?property ?predicate ?choice ?group
          ?min1 ?max1 ?valueType ?nodeKind ?shapeLabel

This produces a hierarchy table with five columns: entry, isShape, choice, group, TripleConstraint. The abstract syntax is built in two steps:

compose a map of entry to list of tuples of isShape, choice, group, TripleConstraint
starting with entries where isShape is true, compose a Rule(entry, GroupShape()).
The embed(entry, collection) function takes an entry and a GroupShape or DisjuntiveShape:
for each mapping e with a key of entry:
- If choice is bound, add a new SomeOfShape to collection and invoke embed with choice and the DisjuntiveShape.
- else if group is bound, add a new GroupShape to collection and invoke embed with group and the DisjuntiveShape.
- else TripleConstraint is added to collection.

RDF node types are identified by the following IRIs:

Table 2. RDF Node Type Identifiers
RDF node type	SHACL identifier
IRI	sh:IRI
Literal	sh:Literal
Blank Node	sh:BNode

RDF instance example

The following example represents a shape my:UserShape composed of an group shape with two conjuncts:

An disjunctive shape with two disjuncts:
1. A triple constraint with a predicate of foaf:name, a datatype of xsd:string, a minimum cardinality of 1, a maximum cardinality of 1.
2. A triple constraint with a predicate of foaf:givenName, a datatype of xsd:string, a minimum cardinality of 1, no maximum cardinality.
A triple constraint with a predicate of foaf:mbox, a node type of RDF IRI, a minimum cardinality of 1, no maximum cardinality.

# shapes (Turtle)
	  my:UserShape a sh:Shape ;
	  sh:choice [
          sh:property [
          sh:predicate foaf:name ;
          sh:valueType xsd:string ;
          sh:minCount 1 ; sh:maxCount 1
          ] ;
	  
          sh:property [
          sh:predicate foaf:givenName ;
          sh:valueType xsd:string ;
          sh:minCount 1
          ] ;
	  ] ;                                          
	  sh:property [
          sh:predicate foaf:mbox ;
          sh:nodeType sh:IRI ;
          sh:minCount 1
	  ] .

Evaluation

Preliminaries

We start with few preliminary definitions and notations.

Notations

shapes(S)	the set of shape labels that appear in the schema S
expr(T, S)	the shape expression that is in the definition of the shape label T in the schema S
incl(T, S)	the set of included properties associated with the definition of the shape label T in S. Note that if T is a closed shape, then incl(T, S) is empty.
properties(Expr)	the set of properties that appear in some triple constraint in the shape expression Expr
inv-properties(Expr)	the set of properties that appear in some inverse triple constraint in the shape expression Expr
dep-graph(S)	the shapes dependency graph of S, is the directed graph which set of nodes is shapes(S), and that has an edge from T1 to T2 iff the shape label T2 is in expr(T1, S)
dep-subgraph(T, S)	the sub-graph of dep-graph(S) induced by the nodes reachable from the node T in dep-graph(S); here by reachable we mean the classical reachability in graphs
negshapes(S)	the set of negated shape labels in the schema S; these are the shape labels that appear in dep-subgraph(T, S) for some shape label T s.t. T appears in a negated shape constraint, or T appears in some triple constraint or inverse triple constraint under a one-of constraint, or there is a shape label T1 and a shape triple constraint p::C, or an inverse shape triple constraints ^p::C in expr(T1, S), and T appears in C, and p belongs to incl(T1,S).
!T	a negated shape label, that is, denotes that T is an element of negshapes(S)
allowed(V)	the set of allowed values for a value constraint V
S_ri	for a schema S, let a proof tree for some Neigh \|- Expr, where Neigh is a set of triples and Expr is a shape definition in S, and let r be a node in that proof tree that corresponds to some application of rule-one-of, and let Expr_ri be sub-expression used in the premise of the rule application in r. Then S_ri is the schema obtained from S by removing the sub-expression Expr_ri from S

Intuitively, negshapes(S) is the set of shapes labels for which one needs to check whether some nodes in a graph do not satisfy these shapes, in order to validate the graph against the schema S. Whatever the kind of a value constraint (value set, or literal data type, or node kind), it defines a set of values. For instance, the allowed values of the literal data type constraint int are all the literal integer values; the allowed values of the nonliteral value constraint are all IRI and all blank nodes.

Definition [Well defined schema]

A shape expression schema S is called well defined if for all negated shape label !T in negshapes(S), the corresponding dependency sub-graph dep-subgraph(T, S) is a directed acyclic graph.

The semantics of shape expression schemas is sound only for well-defined schemas. Therefore, from now on, we consider only well defined schemas.

Declarative semantics of shape expression schemas

Negated triple and inverse triple constraints are introduced as syntactic facility, their semantics being defined using their non negated versions and zero cardinality. More precisely, for every triple or inverse triple constraint X, its negated version !X is a shortcut for X[0;0]. Therefore, in what follows we do not give semantics for negated constraints. Note also that, even though called negated, these constraints do not introduce negation in the sense of negated shapes, and do not interfere with well-definedness of schemas.

In order to handle triple constraints and inverse triple constraints, the triples of a graph will be labeled depending on whether they have the focus node as subject, or as object. Concretely a labeled triple is either an outgoing triple of the form (out, n, p, u), or an incoming triple of the form (inc, u, p, n), where (n, p, u) and (u, p, n) are triples, and out and inc are special labels. From now on, we consider that all triples are labeld, and call them simply triples (even though technically they are quadruples).

Definition [Triple matches constraint]

We say that an outgoing triple (out, n, p, u) matches a triple constraint a::C iff p = a.

We say that an incoming triple (inc, u, p, n) matches an inverse triple constraint ^a::C iff p = a.

The following definition introduces the notion of satisfiability of a shape constraint by a set of triples. Such satisfiability is going to be used for checking that the neighborhood of a node satisfies locally the constraints defined by a shape expression, without taking into account whether the shapes required by the triple constraints and inverse triple constraints are satisfied.

Definition [Set of triples satisfies a shape expression]

Let Neigh be a set of (labeled) triples, and let Expr be a shape expression (as defined by ShapeExpr). We say that Neigh satisfies Expr iff:

Expr is the empty shape emptyshape and Neigh is the empty set, or
Expr is a triple constraint a::C[m;M] (where m and M are the minimal and the maximal cardinality, respectively), every triple in Neigh matches a::C, and the number of elements of Neigh is in the bounds given by [m;M];
Expr is an inverse triple constraint ^a::C[m;M] (where m and M are the minimal and the maximal cardinality, respectively), every triple in Neigh matches ^a::C, and the number of elements of Neigh is in the bounds given by [m;M];
Expr is a some-of shape, let Expr = Expr₁ | Expr₂ | … | Expr_k, and Neigh satisfies Expr₁, or Neigh satisfies Expr₂, … or Neigh satisfies Expr_k;
Expr is a one-of shape, let Expr = Expr₁ • Expr₂ • … • Expr_k, and Neigh satisfies Expr₁, or Neigh satisfies Expr₂, … or Neigh satisfies Expr_k;
Expr is a grouping, let Expr = Expr₁, … , Expr_k, and Neigh can be split into k disjoint sets of triples Neigh = Neigh₁ ∪ … ∪ Neigh_k s.t. Neigh_i satisfies Expr_i for all i in 1..k.
Expr is a repetition, let Expr = Expr[m;M], and there exists a k within the bounds given by [m;M] s.t. Neigh can be split into k disjoint sets of triples Neigh = Neigh₁ ∪ … ∪ Neigh_k and each of these sets of triples satisfies Expr, that is, Neigh_i satisfies Expr for all i in 1..k.

Note that the conditions for some-of and one-of shapes are identical. The distinction between both will be made by taking into account also the non-local, shape constraints.

The above definition can be written using the following set of inference rules. We denote Neigh |- Expr the fact that Neigh satisfies Expr.

If a set of triples Neigh satisfies a shape expression Expr, then one can construct (at least one) proof tree which root is Neigh |- Expr, using the above induction rules. Given such proof tree, it can be shown that every outgoing triple (out, n, p, u) in Neigh appears in the conclusion of exactly one application of rule-triple-constraint. Similarly, every incoming triple (out, u, p, n) in Neigh appears in the conclusion of exactly one application of rule-inverse-triple-constraint. For every outgoing, resp. incoming triple (x, n, p, u) in Neigh, let wm(x, n, p, u) be the triple constraint p::C, resp. the inverse triple constraint ^p::C, that appears in the conclusion of the same rule application as (x, n, p, u) (where x is one of out or inc). We call wm a witness mapping (for the fact that Neigh satisfies Expr). Note that every proof tree defines a unique witness mapping.

For an RDF graph G and a node n in G, the outgoing neighbourhood of n in G is the set of labeled triples out(G,n) = (out, n, p, u) s.t. (n, p, u) is a triple that belongs to the graph G, and the incoming neighbourhood of n in G is the set of labeled triples inc(G, n) = (inc,u, p, n) s.t. (u, p, n) is a triple that belongs to the graph G.

On the implementation level, extension conditions are to be handled by a plugin mechanism, in which the validation procedure delegates checking of the extension condition to a registered plugin. The result of evaluating the extension condition can be true: the extension condition is satisfied, or false: the extension condition is not satisfied, or error: there was an error during the execution, or undefined: the evaluation procedure didn't find the appropriate plugin. On the semantics level, we suppose that for every extension language lang, there exists an oracle function f_lang that takes as parameters an RDF graph, an IRI corresponding to the focus node, and a string corresponding to the extension condition, and returns as result one of true, false, error, and undefined. For the unsupported extension languages (the result is undefined), the default behaviour is to consider that the constraint is satisfied; this however can be parametrized.

Definition [Typing, valid typing]

Fix a schema S and a graph G.

A typing of G is a map that associates a (possibly empty) set of shape labels (shapes(S)) and negated shape labels (negshapes(S)) with every node of G, and such that for every node n in G and for every negated shape label !T ∈ negshapes(S), either T or !T belongs to t(n).

For a typing t, a node u, and a shape constraint C, we say that t(u) satisfies C, if:

C = T1 and ... and Tk, and Ti ∈ t(u) for all i ∈ 1..k, or
C = T1 or ... or Tk, and Ti ∈ t(u) for some i ∈ 1..k, or
C = !(T1 and ... and Tk), and !Ti ∈ t(u) for some i ∈ 1..k, or
C = !(T1 or ... or Tk), and !Ti ∈ t(u) for all i ∈ 1..k.

For a typing t, a node n and a triple or inverse triple constraint X, let Matching(n, t, X) be the set of triples defined by:

Matching(n, t, p::C) = {(out, n, p, u) ∈ G | u ∈ allowed(C)} if p::C is a value triple constraint;
Matching(n, t, p::C) = {(out, n, p, u) ∈ G | t(u) satisfies C} if p::C is a shape triple constraint;
Matching(n, t, ^p::C) = {(inc, u, p, n) ∈ G | t(u) satisfies C} if ^p::C is an inverse triple constraint.

A typing is called valid typing of G by S if for all node n in G,

for all negated shape label !T, if !T ∈ t(n), then t1 is not a valid typing, where t1 is the typing that agrees with t everywhere, except for T ∈ t1(n), and
for all shape label T, if T ∈ t(n), then there exist three mutually disjoint sets Matching, OpenProp, Rest such that
1. out(G, n) ∪ inc(G, n) = Matching ∪ OpenProp ∪ Rest, and
2. Rest = Rest_out ∪ Rest_inc, where
  Rest_out = {(out, n, p, u) ∈ out(G, n) | p ∉ properties(expr(T, S))}, and
  Rest_inc = {(inc, u, p, n) ∈ inc(G, n) | p ∉ invproperties(expr(T, S))}, and
3. Matching is the union of the sets Matching(n, t, X) for all triple constraint or inverse triple constraint X that appears in expr(T, S), and
4. if T is a closed shape, then Rest_out = ∅ and OpenProp = ∅
5. if T is an open shape, then OpenProp ⊆ {(out, n, p, u) ∈ out(G, n) | p ∈ incl(T, S)}
6. there exists a proof tree with corresponding witness mapping wm for the fact that Matching satisfies expr(T, S), and s.t.
  - for all outgoing triple (out, n, p, u), it holds (out, n, p, u) ∈ Matching(n, t, wm((out, n, p, u))), and moreover if wm((out, n, p, u)) is a shape triple constraint, then there is no value triple constraint p::C in expr(T, S) s.t. (out, n, p, u) ∈ Matching(n, t, p::C), and
  - for all incoming triple (inc, u, p, n) ∈ G, it holds (inc, u, p, n) ∈ Matching(n, t, wm((inc, u, p, n))), and
  - for all node r that corresponds to an application of rule-one-of in the proof tree, there does not exist a valid typing t1 of G by S_ri s.t. T ∈ t1(n), and
7. for all extension condition (lang, cond), associated with the type T, f_lang(G, n, cond) returns true or undefined.

We now give a more intuitive explanation of the above definition.

The fact that t(u) satisfies a shape constraint C is used to ensure that the typing t correctly propagates the shape constraints required in the shape triple constraints.

The set Matching(n, t, X) contains all the triples in the neighbourhood of the node n that match the constraint X while propagating the shape constraints required by X.

Now, passing into review all the conditions for a valid typing. Intuitively, a valid typing will associate the shape T to a the node n only if n satisfies the constraints for T. As some constraints require to check that some nodes do not satisfy some shapes, we also keep track of the non-satisfied shapes, by associating negated types with those nodes.

Intuitively, we want to associate the negated shape !T to a node n only if n does not satisfy the constraints for T. This requirement is insured by the fact that replacing !T by T does not yield a valid typing.
All the other conditions are there to ensure that the typing t properly captures the satisfiability of the non negated constraints.
1. The triples in the neighborhood of the node n contribute to satisfy the shape T in different ways, and are therefore dispatched to three disjoint sets, Matching, OpenProp and Rest.
2. The set Rest contains all the triples which property is not mentioned in the definition of the shape T. Note that we consider separately the outgoing and incoming properties.
3. The set Matching contains all the triples that satisfy some of the triple constraints or inverse triple constraints from the definition of the shape T. It follows that OpenProp contains the triples whose property is mentioned in T, but that do not satisfy the condition for the object node (for outgoing triples) or for the subject node (for incoming triples).
4. A closed shape does not allow outgoing triples which property is not mentioned in the shape definition, nor triples which property is mentioned, but did not satisfy the recursive shape constraints or the value constraints. On the other hand, the "closedness" criterion applies only on the outgoing triples: the fact that there is no constraint on Rest_inc means that we always allow incoming triples whose properties are not mentioned. The asymmetric treatment of incoming and outgoing triples is a design choice: we offer the possibility to define more precise constraints for outgoing triples, as such constraints appear to be more useful, according to the use cases.
5. An open shape allows all triples which properties are not mentioned (no restriction on the set Rest), and allows also outgoing triples in OpenProp as soon as their property is authorized by the included open properties. Note that the included properties are only allowed for the outgoing triples.
6. The most complex condition ensures that the constraints are satisfied recursively. As a first condition, all the triples that matched some of the triple constraints (or inverse triple constraints), must participate in satisfying the local and recursive constraints specified in the type definition. This requirement is translated by the fact that Matched |- expr(T, S). Moreover,
  - If an outgoing triple (out, n, p, u) participates in satisfying some triple constraint p::C, then the shape or value constraint C is satisfied by the object node u. Additionally, we give a "priority" to the value constraints, requiring that whenever the triple (out, n, p, u) satisfies some of the value triple constraints, it cannot be used as a witness for some of the shape triple constraints;
  - Similarly, the shape constraints required by the inverse triple constraints are correctly propagated through the incoming triples.
  - The next condition ensures that in every one-of constraint, only one of the sub-constraints is satisfied. This is ensured by the fact that if this sub-constraint is removed, then no valid typing can be found.
7. The very last condition ensures that the extension constraints are satisfied.

Introduction

Abstract Syntax

RDF abstract syntax

Simple shape expression schemas

RDF Vocabulary

RDF instance example

Associating Data with Shapes

Evaluation

Preliminaries

Declarative semantics of shape expression schemas

Validating a graph w.r.t. a schema and required shapes

Additional features

Stemming for properties