Discover more from The Ontologist

Principles of Metadata Architecture, Graph Theory and AI

Classes, Categories, Types, and Shapes

An Exploration of Classification

Kurt Cagle

Jan 09, 2024

It’s been a while since I’ve managed to dip seriously into data modelling here, so it seemed like a good opportunity to explore one of the more confusing aspects of semantics - the distinctions between classes, categories, types and shapes.

The reality is that there isn’t that much of a distinction. Each of them, except the last, describes certain graph patterns (or shapes), and while I’ve seen a lot of arguing between ontologists about how they differ, in practice, they are primarily very subtle design practices.

Understanding Classes and Classifications

Let’s start with the conceptual motherlode: the Class and its corresponding notion of Instance. An instance, when you get right down to it, is a thing. It has a set of properties that describe the characteristics of that thing. Instances are typically (although not always) physical entities - they describe things in the real world. For instance, I can talk about an animal that I know that has four feet, blue/grey fur, green eyes, claws within paws, a tail, is roughly 11 inches long from its nose to its haunches, and makes a distinctive purring sound when content and a high pitched “meeuwww” sound when communicating with humans. These are all properties that describe my particular pet, Bright Eyes.

As I was describing these properties, I do not doubt that in your mind, you were building a picture in your head of this thing I was describing, pertinent to something that your mind does very well: classification. A classification is a tool the mind uses to identify a particular template that multiple instances satisfy that can be used to predict the expected appearance and behaviour of an object. With enough evidence, the mind can say, “Oh, what you are describing is a cat.” With that, the brain can fill in other characteristics that haven’t been stated - the fact that cats are obligate carnivores (they must eat meat), that they sleep a great deal, that they like sitting in boxes, that they don’t necessarily get along with cats, that they usually liked to be petted and so forth.

Classification is how the brain can be lazy or, put another way, can avoid expending any more energy than it needs to when faced with non-novel things. On the other hand, should someone encounter such an animal who has neither seen nor heard of a cat before, the classification provides a handle to determine more about the characteristics that describe cats, hence, it serves a very important role in language acquisition.

Note that few classifications are perfect. I can have a cat that does not have a tail (such as a Manx cat or a cat that was involved in an accident). I can have a cat that has no fur (a Sphinx). A classification (or, to shorten it, a class) is, in essence, a model for identifying a given concept when faced with new instances. A cat is likely to have four feet, but it could have fewer due to disease or accident. However, it is highly unlikely to have more. It will have fur but optionally may not, will usually eat meat but will occasionally eat other things, and will optionally have a tail but may not (and it’s vanishingly rare to have two or more).

An Ontology Is A Taxonomy of Classes

The same instance may be representative of multiple classes. A cat is also a mammal (it nurses its young), a vertebrate (it has a backbone), and is, as previously mentioned, an animal. A cat may also be a pet. It may be a contestant in a cat show. A geneticist will tell you that a cat can be distinguished from a wildcat by thirteen genes, and 1,376 genes make cats different from all other animals. meaning that to a geneticist, a cat is a specific assemblage of genes in a furry, purring body.

This is important because an instance describes a given entity describing relevant characteristics (properties + values), while a class is an abstraction of that entity describing relevant properties alone. A class is, at the end of the day, a template that can be used to describe an entity or resource in a specific way when each property in that class is identified.

In semantic terms (i.e., using the RDF framework) and using the Turtle language, you can create a relationship between an instance and a class by using the “a” property.

:BrightEyes a Cat: .
:BrightEyes a Mammal: .
:BrightEyes a Vertibrate: .
:BrightEyes a Animal: .

The term :BrightEyes here is used to identify a specific instance - the cat that lives with me at my house. The terms Cat:, Mammal:, Vertibrate:, and Animal: each identifies a different kind of class, with the a property indicating that :BrigthEyes (my cat) is an instance of the given class.

By now, if you aren’t an ontologist, you’re likely thinking that the above sequence looks like a taxonomy, and you’d be correct (if you are an ontologist, you’re probably wondering why I’m using the bare namespace prefix notation for classes … bear with me. I have my reasons).

Each of the above terms is part of a general classification scheme, often called a taxonomy. In the above parlance, one class can be considered a subclass if it either adds new properties that the previous class does not specify or it restricts certain values (it creates a constraint) that are the same for all subsequent subclasses. Thus I can make the following assertions (here using the Turtle language):

Cat: rdfs:subClassOf Mammal: .
Mammal: rdfs:subClassOf Vertibrate: .
Vertibrate: rdfs:subClassOf Animal: .
Animal: rdfs:subClassOf Concept: .

The prefix rdfs: identifies the RDF Schema namespace, which provides several (very basic) relationships that identify resources). The statement (or assertion)

Cat: rdfs:subClassOf Mammal: .

should be read as

The Cat: class is a subclass of the Mammal: class.

Note that this does not say that there isn’t something between a Cat and a Mammal, only that one inherits the attribute of the other. For instance, the two statements:

Cat: rdfs:subClassOf Carnivore: .
Carnivore: rdfs:subClassOf Mammal: .

would identify that there is at least one level of classification between the two initial classes, specifically the class Carnivore:.

If you are familiar with taxonomies, you’re probably itching to say, “Hey, what about SKOS?” SKOS, or the “Simply Knowledge Organization System” is a scheme for classification that goes back well before the formal beginning of the Semantic Web (more or less the year 2000) but that is frequently used with taxonomies. SKOS would represent this same system as:

Cat: skos:broader Mammal: .
Mammal: skos:broader Vertibrate: .
Vertibrate: skos:broader Animal: .
Animal: skos:broader Concept: .

where

Cat: skos:broader Mammal: .
# sometimes abbreviated as
Cat: skos:bt Mammal: .

would be read as “The Cat: class has a broader term in the Mammal: class.”

This is often described as a bottom-up approach - the subordinate class or concept is the originator of the relationship (in the subject or first position), and the superordinate class or concept is in the right-hand position. This means that if you follow either rdfs:subClassOf or skos:broader up from one term to the next, you have a path that takes you to the very root of a tree.

On the other hand, if you had a property called rdfs:superClassOf (which isn’t defined) or skos:narrower (which is), then for each superclass, you’d have to specify all of the immediate subclasses. From a relational database standpoint, this many-to-one relationship is generally impossible to do, and even with a more semantic way of thinking, it has its issues. If you use a language like SPARQL, you can use that path language to walk the single path to the root of a tree, such as:

select ?ancestorOrSelf where {
     ?this rdfs:subClassOf* ?ancestorOrSelf.
}

while you can retrieve all of the descendents of a given node as:

select ?descendentOrSelf where {
     ?this ^rdfs:subClassOf* ?descendentOrSelf.
}

What’s so intriguing about this is that the subClassOf relationship essentially maps a taxonomy of all of the classes within your ontology, assuming that they all have a common ancestor.

Enumeration Types, Groups, and Faceting

An entity be more than one kind of thing simultaneously. For instance, we can describe animals as being one of three potential states - wild, feral, or tame, with a wild animal being one that has not adapted to human interaction, a feral animal being one that has some domesticated traits but has not completely adapted to human existence (such as foxes or wolves), while a tame animal has been shaped by its relationship to humans (cats, dogs, horses, etc.).

These kinds of qualifications are odd - they may fit within a taxonomy, yet at the same time, they don’t fit well into a subclass relationship but are closer to being traits. I like to think of these as categories or types, and they make up a significant percentage of what differentiates between subclasses because they are specializations or class qualifications.

Broadly, I usually define these qualifications as being of type Category: . For instance, I can talk about a class called Domestication:, which has three instances: Domestication:Wild, Domestication:Feral, and Domestication:Tame . You can then create a property called Animal:domestication which can then have one of these three values being used as follows:

Animal:Fox Animal:domestication Domestication:Feral.

What differentiates a category from another class is that the instances of that class are themselves classes:

Category: rdfs:subClassOf rdfs:Class .
Domestication: a Category: .
Domestication:Wild a Domestication: .

This relationship is different than for entities:

Entity: a rdfs:Class .
Animal: rdfs:subClassOf Entity: .
Animal:Fox a Entity: .

This distinction is subtle but important. Entity classes are specialized through subclass inheritance (an Animal: and a Fox: are both entities with a Fox: being a specialized kind of Animal:) and can be quite deep. Categories generally have a few broad specializations (such as units or enumerations), but are typically quite shallow.

Indeed, we can classify our Domestication: type as being an enumeration, which is a subclass of category, as follows:

Category: rdfs:subClassOf rdfs:Class .
Enumeration: rdfs:subClassOf Category: .
Domestication: a Enumeration: .
Domestication:Wild a Domestication: .

Enumerations are notable for several reasons. First, as the name implies, they usually have a specific implicit ordering. For instance, if you create a property called Enumeration:order, you can assign an ordering scheme:

Domestication:Wild Enumeration:order 1 .
Domestication:Feral Enumeration:order 2 .
Domestication:Tame Enumeration:order 3 .

A given animal can then have a domestication enumeration via the Animal:hasDomestication property.

:Fennic a Animal:Fox ;
:Fennic Animal:hasDomestication Domestication:Feral .

:Vixen a Animal:Fox ;
:Vixen Animal:hasDomestication Domestication:Tame .

Enumerations have a one-to-one relationship with the things they modify (for instance, a given fox cannot be both feral and tame simultaneously, though there may be an undefined enumeration between the two that defines the actual state. Another way of thinking about an enumeration class is that enumerations partition a continuous interval into finite buckets, unlike groupings.

A group is a named subset of instances of a given common class. For example, suppose that I had a class of countries (which are entities), and each country belonged to several country groups.

:GreatBritain Country:inCountryGroup 
     :IslandCountries, :EuropeanCountries;
     .
:Japan Country:inCountryGroup
     :IslandCountries, :AsianCountries;
     .

At first glance, it would be tempting to say that CountryGroup:IslandCountries is an enumeration, but it isn’t. Instead, it’s a group, or more properly, an organization, which is an entity class:

Organization: rdfs:subClassOf Entity: .
Country: rdfs:SubClassOf Organization: .
CountryGroup: rdfs:subClassOf Organization: .

The relationship can be seen in the SHACL definition of the relationship Country:inCountryGroup.

Country: a sh:NodeShape;
    sh:targetClass Country: ;
    sh:property [
        sh:name "inCountryGroup" ;
        sh:path Country:inCountryGroup|Entity:isMemberOf ;
        sh:class CountryGroup: ;
        sh:minCount 0;
        ];
   .

Entity: a sh:NodeShape;
    sh:targetClass Entity: ;
    sh:property [
        sh:name "member of" ;
        sh:path Entity:isMemberOf ;
        sh:class Organization: ;
        sh:minCount 0;
        ];
   .

The sh:path for the Country:inCountryGroup should be interesting because it is another way of writing the rdfs:subPropertyOf relationship. It states that you can use either Country:inCountryGroup or the more general Entity:memberOf relationship to describe the relationship between a country and its country group, but in the latter case, the target (or rdfs:range in RDFS terms) will specifically be a CountryGroup: if the source (rdf:domain) is a Country: . As an example,

:Japan Country:inCountryGroup :IslandCountries.
# is the same as and has the same constraints as
:Japan Entity:isMemberOf :IslandCountries.

This can have a huge impact on type-ahead and user interfaces, as the latter is likely easier to remember, but you don’t necessarily want all organizations listed as options.

Organizational (membership) relationships like these are frequently used for faceting. For instance, you can get the intersection of all island countries and all European countries by doing a simple SPARQL query:

# SPARQL
select ?country where {
    ?country Entity:isMemberOf :IslandCountries,:EuropeanCountries .
}

This would return Great Britain and Ireland, but not Japan or Germany.

This is one of the reasons that data modelling is important - understanding cardinality patterns and inheritance can simplify your query code and more accurately represent the reasoning that can be done on your data.

Subclassing by Restriction and a SHAPE Primer

There are three primary forms of inheritance in a graph. One is direct inheritance - you either add new properties that are exclusive to a subclass that may not be specified in a superclass - or you restrict an existing property in an existing class (e.g., the class CatInWashingtonState: covers only cats that are registered in WashingtonState, as an example. The second situation involves property inheritance, which I won’t cover here.

The third form of inheritance comes from aggregation. Sometimes you run into situations where you want to describe things that inherit from multiple classes at once. This isn’t ideal because it means that your ontological taxonomy can become cyclical.

For example, suppose you wanted to describe a chimera, a being that involves the composite of two other animals. For example, you may be asked to describe a gryphon, which has the head of an eagle and the body of a lion. Even if this was a real animal (it’s not), creating an inheritance pattern from an eagle and a lion as an example, breaks the tree-like nature of so many taxonomies, especially if this was a relatively rare occurrence in your taxonomy. In this case, it is better to define a different class called a Chimerical (since we want to use Chimera as a distinct subclass) of which Gryphon is a subclass, then add properties as follows:

Chimerical: a sh:NodeShape, rdfs:Class ;
      sh:declare [
            sh:prefix "Chimerical" ;
	    sh:namespace "http://example.com/ns/Chimerical#"^^xsd:anyURI ;
	];
    sh:property Chimerical:hasHeadOf, Chimerical:hasUpperBodyOf,
         Chimerical:hasLowerBodyOf,Chimerical:hasTailOf,Chimerical:hasWingsOf,
         Chimerical:hasLegsOf;
         .

Chimerical:hasHeadOf a sh:PropertyNode, rdf:Property ;
         sh:name "head" ;
         sh:path Chimerical:hasHeadOf ;
         sh:minCount 0;
         ch:class Animal: ;
         .

Chimerical:hasHairOf a sh:PropertyNode, rdf:Property ;
         sh:name "hair" ;
         sh:path Chimerical:hasHairOf ;
         sh:minCount 0;
         ch:class Animal: ;
         .

Chimerical:hasHornsOf a sh:PropertyNode, rdf:Property ;
         sh:name "horns" ;
         sh:path Chimerical:hasHorns ;
         sh:minCount 0;
         ch:class Animal: ;
         .

Chimerical:hasUpperBodyOf a sh:PropertyNode, rdf:Property ;
         sh:name "upperBody" ;
         sh:path Chimerical:hasUpperBodyOf ;
         sh:minCount 0;
         sh:maxCount 1;
         ch:class Animal: ;
         .

Chimerical:hasLowerBodyOf a sh:PropertyNode, rdf:Property ;
         sh:name "lowerBody" ;
         sh:path Chimerical:hasLowerBodyOf ;
         sh:minCount 0;
         sh:maxCount 1;
         ch:class Animal: ;
         .

Chimerical:hasTailOf a sh:PropertyNode, rdf:Property ;
         sh:name "tail" ;
         sh:path Chimerical:hasTailOf ;
         sh:minCount 0;
         sh:maxCount 1;
         ch:class Animal: ;
         .

Chimerical:hasWingsOf a sh:PropertyNode, rdf:Property ;
         sh:name "wings" ;
         sh:path Chimerical:hasWingsOf ;
         sh:minCount 0;
         sh:maxCount 1;
         ch:class Animal: ;
         ];
   .

From this, you can make Gryphon a subclass of Chimerical and then constrain those properties that define gryphons uniquely:

Chimerical:Gryphon a sh:NodeShape, rdfs:Class ;
      sh:targetClass Chimerical:Gryphon ;
      rdfs:subClassOf Chimerical:
      sh:property [
            sh:path Chimerical:hasHeadOf 
            sh:hasValue Animal:Eagle ;
            ];
      sh:property [
            sh:path Chimerical:hasLowerBodyOf 
            sh:hasValue Animal:Lion ;
            ];
      .

:KenGryphonJr a Chimerical:Gryphon .

What this means in practice (assuming that you have SHACL inheritance in place)* you should be able to do a SPARQL query like:

select ?this ?subClass? head ?horns ?lowerBody where {
     ?this a ?subClass.
     ?this Chimerical:hasHeadOf ?head.
     optional {
         ?this Chimerical:hasHornsOf ?horns.
     }
     optional {
        ?chimerical Chimerical:hasLowerBodyOf ?lowerBody.
     }

and will get a table of the form:

\begin{array}{rcl} ? t h i s & ? s u b C l a s s & ? h e a d & ? h o r n s & ? l o w e r B o d y \\ : K e n G r y p h o n J r & C h i m e r i c a l : G r y p h o n & A n i m a l : E a g l e & - - & A n i m a l : L i o n \\ : J a c q u e l i n e & C h i m e r i c a l : J a c k a l o p e & A n i m a l : R a b b i t & A n i m a l : D e e r & A n i m a l : R a b b i t \\ : S o p h i e & C h i m e r i c a l : S p h i n x & A n i m a l : H u m a n & - - & A n i m a l : L i o n \end{array}

* Note: OWL does things similar to owl:restrict statement, and not all systems support SHACL yet. I’ll leave the OWL for a later article.

This code introduces SHACL without a lot of explanation. SHACL is short for the Shape Constraint Language, which is a language that defines both definitions and constraint schemas as Shapes.

A Shape can be considered a template for defining classes and their properties, among other things. In this case, the Chimerical: shape identifies five properties, indicating the replacement parts of a chimerical being, such as Chimerical:hasHeadOf or Chimerical:hasLowerBodyOf. The initial properties are tied into Chimerical: itself, and these, in turn, declare things like cardinality (such as one-to-one or one-to-many relationships).

In the Chimerical:Gryphon class example, on the other hand, those two properties (via sh:path) are restricted to only take as values the animal classes Animal:Eagle and Animal:Lion. This means that if I create an instance of the Chimerical:Gryphon class (here :KenGryphonJr) it will automatically indicate that it has those two properties restricted to those values, as shown in the table. The same can be used to create other specialized subclasses, such as hippogryphs, centaurs, sphinxes, and jackalopes.

I wanted to point out one more thing in the examples (and this is focused primarily on the ontologists and vendors out there). One of the central problems that you run into with working with older triple stores is that you have to predeclare namespaces externally. This makes writing SPARQL difficult and even puts a lot of overhead on generating Turtle or JSON-LD, and it existed primarily as an older index optimization that is no longer as relevant today.

If you can declare namespaces and set local prefixes within SHACL definitions, these can be made “live” in the system without declaring them a priori, especially when running SPARQL. One additional benefit to this - by declaring abstract classes with property interfaces and then defining subclasses by restriction (such as Chimerical:Gryphon), you can identify subclasses as belonging to the nearest abstract class and dramatically cut down on the number of namespaces you need to define in the first place. This is especially important to ontologists trying to reduce the number of declared namespaces ahead of time for performance and indexing reasons.

Summary

There are several points worth considering here:

A canonical ontology can be seen as a taxonomy of classes.
Use transitive closure of a property to move to the root of the taxonomy class and the inverse of the property to retrieve the associated tree. This means that bottom-up design is generally more efficient and easier to manage.
Use abstract classes to define new interfaces, then use constraints and properties to create concrete classes.
Declare only your abstract classes as namespaces, then use these to organize your properties.
SHACL is cool and is only beginning to show what it can do.

Kurt Cagle is the Editor of The Ontologist and is a practising ontologist, solutions architect, and AI aficionado. He lives in Bellevue, Washington, with his wife and cats.

For more posts and resources on semantic modelling and ontology development, subscribe to The Ontologist:

Subscribe to The Ontologist

By Kurt Cagle · Launched a year ago

Principles of Metadata Architecture, Graph Theory and AI

7 Likes

2 Restacks

Dave Richardson

Carpe Noctem

Oct 14

I'm curious about your use of colons after the identifiers, e.g.:

Cat: skos:broader Mammal: .

Doesn't look like valid Turtle to me. What was the thought there?

Expand full comment

1 reply by Kurt Cagle

miked

Jan 9

Thanks for this! : ). A canonical ontology can be seen as a taxonomy of classes... But what of the other class-class relations that are found in ontologies -- the non-taxonomic relations? Your characterization seems to undersell ontologies significantly. I would have added a note to that effect...