Hypergraphs and SHACL Rules

Jan 22

More explorations on how to implement hypergraphs in RDF 1.2

7 Comments

Respectfully, "Mathematically, a hypergraph can be thought of using sets of IRIs or literals rather than a single IRI or literal as one or more items of a triple (as a subject, predicate or object)." is not a mathematical definition.

Mathematically, a hypergraph is defined by sets of vertices (V) and hyperedges (E), independent of data types like IRIs or literals.

Reply (1)

Kurt Cagle

Jan 22Edited

This was more clarification than anything for those who are more familiar with knowledge graphs. The core definition of a hypergraph:

A hypergraph is a mathematical structure that generalizes a traditional graph, where a single "hyperedge" can connect any number (two or more) of vertices, unlike regular edges that connect exactly two vertices

As it turns out, RDF already does this when you are assuming that an edge has a common label, and RDF-Star goes one step further and makes it possible for a triple (a VEV combination) to be treated as an entity (reification) where the edge is unique. The definition I used above essentially states that even a unique edge can be bound to sets of vertices, enabling Set operations. I just expanded this to say that you can talk about hypersegments (VEV "pairs") in the same idiom.

The IRIs or literals, fwiw, are an RDF idiom: one corresponds to a unique label, the other to a specific literal value associated with a data type. From an RDF perspective, these are both forms of vertices.

Reply (1)

Ramona C. Truta

Jan 22

I appreciate the context. My point is simply that RDF-star and 'linked literals' are implementation patterns (RDF idioms) used to simulate n-ary relations. Mathematically, a hypergraph is defined by set-theoretic relationships between vertices, independent of any specific data model or serialization format like RDF.

Conflating the two makes it harder for people to distinguish between the abstract data structure and the storage strategy, which is a persistent issue in community discourse. A common pitfall is the 'visual-physical fallacy': assuming the logical representation on paper dictates the physical implementation on disk. In reality, the two are often decoupled by several layers of abstraction, and confusing them leads to leaky architectures.

Reply (1)

Kurt Cagle

Jan 22

Granted. At the same time, RDF is an *abstraction* of a labeled directional cyclic graph (LDCG), not a specific file encoding. Turtle is an encoding, definitely, as is JSON-LD or RDF-XML, but RDF by itself is, at its core, an abstraction layer for describing graphs with a very thin metadata layer on top that's not really all THAT necessary (it's used primarily for logical inference). LDAGs can be represented in the same way, and any U(nlabeled)N(ondirected)CGs can be represented as bidirectional LDCGs.

The one assumption made in RDF is that multiple triples (VEV sets) can have the same predicate (edge label). With RDF-Star, there's a further distinction: the VEV set itself has a unique identifier (a reifier), which can have both distinct predicates (edges) and objects (EVs). That's why I'd argue that RDF-Star describes a hypergraph. The combinatorics basically come from the Turtle notation which doesn't allow you to describe a subject vertex set except as a data structure (here a linked list). I've brought this up to the RDF Working Group a few times over the years, but in general, it's not critical because you can always invert the triple when a set is allowed.

So, yes - there's likely a bit of leakage there, but it's basically between two layers of abstraction, not necessarily between a physical vs. logical divide. I'd contend that RDF-Star (as a graph description language with reification) is *mostly* isomorphic to the mathematical definition of hypergraph, if not a superset of same. I'm always open to being convinced otherwise, however.

Reply (1)

Ramona C. Truta

Jan 23

I'll push back on mostly isomorphic, as there's no such thing and we shouldn't force it. We have to call it what it is: a transformation/encoding used to fit the tool. Isomorphism is a bijection, and it's binary. We either have it or we don't; there is no such thing as a "partial" isomorphism in this context.

I've built complex benchmarking tools to provide empirical proof for data modeling theories. My experience is that theory and practice inevitably diverge when moved from paper to disk, and that is the reality in which we have to architect.

I'll admit that without having built those tools and seeing the friction firsthand, I wouldn't have known exactly where the theory breaks and the implementation takes over.

Reply (1)

Kurt Cagle

Jan 23

Fair enough. This is the essence of programming: you are building models of varying degrees of fidelity from a complete abstraction (such as a mathematical algorithm). In this case, what we're talking about is injective homomorphisms that are nearly equivalent. I make the argument for "nearly" only because I haven't sat down and fully examined the mathematics of hypergraphs in depth enough to compare them (I'm a physicist by training, so most of my math tends to be in areas like differential geometry).

Comment removed

Comment removed

Yup. One reason why you generally push such triples into a secondary graph. Process a single 100x100x100 hypergraph and you're dealing with 1 million triples.

The Ontologist

Hypergraphs and SHACL Rules