From Database to Triple Store with SHACL

A Practical Guide to Transforming Relational Data into Semantic Knowledge Graphs

Feb 19, 2026

That’s Horace. He’s the AI the manages HR.

I had not originally intended to write this article, but during a recent consultation, the question arose: how do you migrate data from a SQL database (here, SQL Server) into a knowledge graph (Jena-Fuseki), using an LLM (in this case, Claude)?

It turns out that this is a great use case for SHACL, for a very simple reason:

SHACL is a schema abstraction layer. It essentially describes shapes, properties, connections and constraints, and only secondarily does it explicitly bind to the RDF data model.

As such it makes a perfect bridge for going from one schematic language to another. In the example covered here, that initial schematic language is the Data Definition Language, or DDL, which is used to construct a relational database, and can be exported by all current SQL databases.

Shaclify

This led to the creation of Shaclify. Shaclify is an open source AI driven project that uses SHACL as its common bridge. It has a github repo.

Note that shaclify is not (currently) a fully fleshed out application but rather is intended to showcase how you can use SHACL and AI to build semantic mappings. Over time, this will likely change.

In this particular case, I created a test HR database with ten tables, and exported the DDL as text. I also exported all of the tables as CSV files (for a small to mid-sized database, you may also want to export the database as an Excel file, which lets you keep the files together as a package).

Once I had the DDL, I included it as part of the following prompt:

Using the SHACL 1.2 core and rules specifications at
https://www.w3.org/TR/2026/WD-shacl12-core-20260211/ (and associated links), 
create a SHACL that describes the DDL, mapping data types, constraints, 
primary key/foreign key associations as IRIs, human-readable names in sh:codeIdentifier, 
mapping instructions in sh:, and descriptive sh:name and sh:description 
properties and sh:intent. 

All foreign key references should be converted to references to table-row IRIs, 
with sh:class for these properties being the associated table class. 

All property shapes and constraint shapes should use formal IRIs rather than blank nodes. 
Stored procedures should be mapped, as much as possible to SHACL rules, 
using the sh:agentInstruction field. 
The goal of this will be to create a mapping converter that will generate a TARQL converter.

The address https://www.w3.org/TR/2026/WD-shacl12-core-20260211/ is the most recent SHACL 1.2 Core working draft (see https://www.w3.org/TR/shacl12-core/ to get the latest core implementation, as it is still evolving).

The key point is that the SHACL files need only be (and should only be) generated once for a given SQL DDL. This becomes the reference schema and should be saved as such. Once this is created, This is then used to generate the corresponding TARQL transformations. Tarql is one of my favorite tools, and should be part of every ontologist’s repertoire - it converts CSV files into RDF Triples (as Turtle). It is Java-based command line tool, using the same ARQ libraries that are part of the Apache Jena project. The TARQL files are essentially SPARQL with some additional properties for handling record (table/column/row) type input.

Building such TARQL files by hand could be time-consuming, especially for a typical commercial database with dozens or even hundreds of tables. However, by creating the SHACL file, what you have done is provided the structural translation layer for the TARQL to do the relevant mappings for you - including the mapping between tables in the SQL database. Running the TARQL on the corresponding CSV files (as a command-line process typically, as this doesn’t require any AI assistance once the tarql is generated), will then generate the corresponding Turtle. An employee record in our HR scheme looks like the following:

<http://example.org/hr/employee/1> a hr:Employee ;
    hr:address "123 Maple Street"^^xsd:string ;
    hr:city "Seattle"^^xsd:string ;
    hr:createdDate "2018-06-01T09:00:00"^^xsd:dateTime ;
    hr:dateOfBirth "1985-03-15"^^xsd:date ;
    hr:email "alice.johnson@example.com"^^xsd:string ;
    hr:emergencyContactName "Bob Johnson"^^xsd:string ;
    hr:emergencyContactPhone "555-0201"^^xsd:string ;
    hr:employeeId 1 ;
    hr:employeeStatus "Active" ;
    hr:firstName "Alice"^^xsd:string ;
    hr:hireDate "2018-06-01"^^xsd:date ;
    hr:lastName "Johnson"^^xsd:string ;
    hr:middleName "Marie"^^xsd:string ;
    hr:modifiedDate "2024-01-10T14:30:00"^^xsd:dateTime ;
    hr:phone "555-0101"^^xsd:string ;
    hr:sSN "123-45-6789"^^xsd:string ;
    hr:state "WA"^^xsd:string ;
    hr:zipCode "98101"^^xsd:string .

while an employee training record would look as follows:

<http://example.org/hr/employeetraining/15> a hr:EmployeeTraining ;
    hr:comments "Sales techniques improved"^^xsd:string ;
    hr:completionDate "2024-07-15"^^xsd:date ;
    hr:course <http://example.org/hr/trainingcourse/13> ;
    hr:createdDate "2024-06-01T09:00:00"^^xsd:dateTime ;
    hr:employee <http://example.org/hr/employee/1> ;
    hr:employeeTrainingId 15 ;
    hr:enrollmentDate "2024-06-01"^^xsd:date ;
    hr:score 82.0 ;
    hr:status "Completed"^^xsd:string .

Note that in this particular case, there are now IRI links for hr:course and hr:employee . This means that we have successfully replaced the foreign key → primary key reference used in the DDL to IRIs in the Turtle.

Once you have created the triples, you can then post them to a triple store database, such as the Apache Jena open source triple-store. As a general rule of thumb, you can do an initial creation of the knowledge graph based upon a full dump of the CSV files using tarql, then can create CSV diffs that will generate the updates. Because a triple store manages sets of triples rather than documents and because you have consistent generation rules through tarql, links to existing entities in the knowledge graph should be preserved.

A second advantage of have the SHACL file generated - you can use it generate subsequent queries against the knowledge graph from claude or similar GPT:

#prompt

Using the attached SHACL file (attach file) write a SPARQL query for the following report:
Give me a report showing everyone who has taken training classes, which classes they took,
 and when they took them.

This creates the following SPARQL:

PREFIX hr: <http://example.org/hr/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT 
    ?employeeId
    ?employeeName
    ?courseCode
    ?courseName
    ?enrollmentDate
    ?completionDate
    ?status
    ?score
WHERE {
    # Get employee training records
    ?empTraining a hr:EmployeeTraining ;
                 hr:employee ?employee ;
                 hr:course ?course ;
                 hr:enrollmentDate ?enrollmentDate ;
                 hr:status ?status .
    
    # Get employee details
    ?employee a hr:Employee ;
              hr:employeeId ?employeeId ;
              hr:firstName ?firstName ;
              hr:lastName ?lastName .
    
    # Get training course details
    ?course a hr:TrainingCourse ;
            hr:courseCode ?courseCode ;
            hr:courseName ?courseName .
    
    # Optional: Completion date
    OPTIONAL { 
        ?empTraining hr:completionDate ?completionDate 
    }
    
    # Optional: Score
    OPTIONAL {
        ?empTraining hr:score ?score
    }
    
    # Construct full employee name
    BIND(CONCAT(?lastName, ", ", ?firstName) AS ?employeeName)
}
ORDER BY ?lastName ?firstName ?enrollmentDate

Note that while this could be used to create a general natural language query system, a better strategy may be to come up with a list of 10-20 frequently asked questions and turn these into a report generator through an API. My general experience is that providing an open-ended query on a knowledge graph is usually a bad idea.

There are a few observations to bring up here.

The model created in this process is likely to hew closely to the SQL database model. This might not be ideal for a general knowledge graph, in part because it brings along SQL relational modelling assumptions that may not fully leverage RDF semantics.
To that end, if you have a more expansive existing reference schema, it may be worth repeating the process on DDL to Schema conversion but with the caveat that it must follow the same design patterns as the reference schema. One of the biggest such design patterns is whether or not a schema uses a top-down approach (with one parent having a directed connection to a number of children) or a bottom-up approach (with each child connecting to the parent, which is usually the case with SQL because of the nature of foreign key to primary key mappings).
The created ontology is structural only. You may want to either select an industry focused upper ontology or generate an upper ontology on the existing schema, or may look to remap the generated ontology to use an existing upper ontology. The following illustrates one such example (generating an upper ontology).

Given the attached schema, create additional triples as Turtle that provide
 for a common upper ontology, defining superclasses and superproperties.

Using the above prompt lets you create a first blush upper ontology. The generated triples are shown in the schemas/hr_upper_ontology.ttl file in the github repo. Note, I generally recommend going the other way (build the upper ontology, then extend to the generated schema), but this illustrates how you can support an inheritance layer.
The key in all cases is to look at SHACL as a reference schema for mapping data from one ontology to another. You can also generate via LLM a SPARQL-Update file that will allow for internal mappings of one schema to another if both are already expressed in Turtle or other forms of RDF, especially when used in conjunction with an indexed triple store.

HR Database Complete TARQL Conversion Report

The following section was generated by Anthropic Claude Sonata 4.6 Extended and illustrates the process. More documentation can be found on the github repo.

Date: 2026-02-19
Conversion Method: TARQL transformation logic (Python implementation)
Schema: SHACL 1.2
Status: ✅ COMPLETE

📊 Conversion Summary

Input Data

Source: 11 CSV files from HR SQL Server database
Total Records: 210
Tables Processed: All 11 tables

Output Data

Format: RDF/Turtle
Triples Generated: 2,221
File Size: 103,259 bytes (100.8 KB)
Output File: hr_database_complete.ttl

Validation Results

SHACL Conformance: 19 minor pattern violations (position codes)
Data Quality: ✅ Excellent - NULL values handled correctly
Foreign Keys: ✅ All valid - proper IRI references
Datatypes: ✅ All correct XSD mappings

📋 Detailed Conversion Statistics

TableRecordsProperties AvgFK ReferencesDepartments871 (manager)Positions2080Employees2013-190EmployeePositions2593 (emp, dept, pos)Salaries328-92 (emp, approver)Benefits1080EmployeeBenefits207-82 (emp, benefit)TimeOffRequests209-112 (emp, approver)PerformanceReviews2011-132 (emp, reviewer)TrainingCourses1590EmployeeTraining209-112 (emp, course)TOTAL210~106 unique props14 FK types

🎯 TARQL Transformation Features Applied

1. IRI Construction

All primary keys converted to unique IRIs:

turtle

# Pattern: http://example.org/hr/{table}/{id}
<http://example.org/hr/employee/5>
<http://example.org/hr/department/3>
<http://example.org/hr/salary/15>

2. XSD Datatype Casting

CSV TypeXSD TypeExampleInteger IDsxsd:integer1, 5, 20Decimal amountsxsd:decimal75000.00, 500.50Date stringsxsd:date"2020-01-15"^^xsd:dateTimestamp stringsxsd:dateTime"2024-01-01T09:00:00"^^xsd:dateTimeText fieldsxsd:string"John"^^xsd:stringBoolean flagsxsd:booleantrue, falseEnumerationsplain literal"Active" (no datatype)

3. NULL Value Handling

CSV: ManagerEmployeeID = "NULL" → RDF: property omitted (no triple created)
CSV: EndDate = "" → RDF: property omitted  
CSV: Budget = "NULL" → RDF: property omitted

Result: Clean RDF with no invalid references or NULL IRIs

4. Foreign Key Transformation

turtle

# SQL Foreign Key:
ManagerEmployeeID = 5

# RDF Object Property:
hr:managerEmployee <http://example.org/hr/employee/5>

5. Boolean Normalization

CSV values: "1", "true", "True", "TRUE" → xsd:boolean true
CSV values: "0", "false", "False", "FALSE" → xsd:boolean false

🔗 Relationship Graph Generated

Departments (8)
    ├─> managerEmployee (6 refs) ─> Employees
    
Positions (20)
    
Employees (20)
    ├─> EmployeePositions (25) ─┐
    ├─> Salaries (32)           ├─> Links to Dept & Position
    ├─> EmployeeBenefits (20) ──┤
    ├─> TimeOffRequests (20)    │
    ├─> PerformanceReviews (20) │
    └─> EmployeeTraining (20) ──┘
    
Benefits (10)
    └─> EmployeeBenefits (20)
    
TrainingCourses (15)
    └─> EmployeeTraining (20)

Total Relationships: 157 foreign key references converted to RDF object properties

💾 Sample RDF Output

Department with Manager Reference

turtle

<http://example.org/hr/department/1> a hr:Department ;
    hr:departmentId 1 ;
    hr:departmentName "Information Technology"^^xsd:string ;
    hr:departmentCode "IT"^^xsd:string ;
    hr:managerEmployee <http://example.org/hr/employee/5> ;
    hr:budget 500000.0 ;
    hr:createdDate "2020-01-01T09:00:00"^^xsd:dateTime ;
    hr:modifiedDate "2025-01-15T14:30:00"^^xsd:dateTime .

Employee with Optional Fields

turtle

<http://example.org/hr/employee/1> a hr:Employee ;
    hr:employeeId 1 ;
    hr:firstName "Michael"^^xsd:string ;
    hr:lastName "Chen"^^xsd:string ;
    hr:email "mchen@company.com"^^xsd:string ;
    hr:phone "555-0101"^^xsd:string ;
    hr:dateOfBirth "1988-06-15"^^xsd:date ;
    hr:hireDate "2018-03-20"^^xsd:date ;
    hr:employeeStatus "Active" ;
    hr:ssn "123-45-6789"^^xsd:string ;
    hr:address "123 Main St"^^xsd:string ;
    hr:city "Seattle"^^xsd:string ;
    hr:state "WA"^^xsd:string ;
    hr:zipCode "98101"^^xsd:string ;
    hr:createdDate "2018-03-20T09:00:00"^^xsd:dateTime ;
    hr:modifiedDate "2025-01-10T15:30:00"^^xsd:dateTime .

Employee Position with Multiple References

turtle

<http://example.org/hr/employeeposition/1> a hr:EmployeePosition ;
    hr:employeePositionId 1 ;
    hr:employee <http://example.org/hr/employee/1> ;
    hr:department <http://example.org/hr/department/1> ;
    hr:position <http://example.org/hr/position/3> ;
    hr:startDate "2018-03-20"^^xsd:date ;
    hr:isCurrent true ;
    hr:createdDate "2018-03-20T09:00:00"^^xsd:dateTime .

Salary History

turtle

<http://example.org/hr/salary/5> a hr:Salary ;
    hr:salaryId 5 ;
    hr:employee <http://example.org/hr/employee/1> ;
    hr:salaryAmount 95000.0 ;
    hr:effectiveDate "2024-01-01"^^xsd:date ;
    hr:salaryType "Base"^^xsd:string ;
    hr:changeReason "Annual Raise"^^xsd:string ;
    hr:approvedBy <http://example.org/hr/employee/5> ;
    hr:createdDate "2023-12-15T10:00:00"^^xsd:dateTime .

⚠️ Validation Notes

Pattern Violations (19 instances)

Issue: Position codes in CSV use format “MKT-MGR”, “SR-SWE”, “LEAD-SWE”
Schema Expectation: ^[A-Z]{2,10}$ (letters only, no hyphens)

Affected Position Codes:

MKT-MGR (Marketing Manager)
RESEARCH-SCI (Research Scientist)
HR-MGR (HR Manager)
FIN-DIR (Finance Director)
SR-FIN-ANALYST (Senior Financial Analyst)
LEAD-SWE (Lead Software Engineer)
DATA-ANALYST (Data Analyst)
SR-SWE (Senior Software Engineer)

Resolution: This is a schema design issue, not a data issue. The SHACL pattern should be updated to:

turtle

sh:pattern "^[A-Z0-9-]{2,20}$"  # Allow hyphens

Impact: None - data is valid and correctly converted, just doesn’t match overly restrictive pattern

Other Validation Results

✅ All required properties present (sh:minCount satisfied)
✅ All datatypes correct (sh:datatype satisfied)
✅ All foreign key references valid (sh:class satisfied)
✅ All cardinality constraints met (sh:maxCount satisfied)
✅ Email, SSN, phone, ZIP patterns validated
✅ Enumeration values valid (Employee status: Active, Inactive, OnLeave, Terminated)
✅ Salary range constraints satisfied (MinSalary < MaxSalary)

📈 Performance Metrics

MetricValueTotal Conversion Time< 2 secondsAverage Records/Second105+Average Triples/Record10.6File Size Efficiency491 bytes/recordMemory Usage< 50 MB

🎓 Key Insights

Data Quality

NULL Handling: CSV contained literal “NULL” strings - properly filtered
Optional Fields: ~40% of fields are optional - correctly omitted when empty
Foreign Keys: 100% referential integrity maintained
Date Formats: Mixed ISO formats handled correctly
Boolean Representation: Multiple formats (”1”, “true”) normalized

Transformation Patterns Used

Conditional Triples: Only create triple if value exists and is not NULL
Type-Safe Casting: Try/except blocks prevent malformed data from breaking conversion
IRI Safety: Validate ID values before constructing URIs
Enumeration Handling: Plain literals for SHACL sh:in validation
Nested References: Handle chains of foreign keys (emp → empPos → dept)

🔍 Sample SPARQL Queries

Query 1: Find All Active Employees with Their Current Positions

sparql

PREFIX hr: <http://example.org/hr/ontology#>

SELECT ?empName ?posTitle ?deptName ?salary
WHERE {
  ?emp a hr:Employee ;
       hr:firstName ?firstName ;
       hr:lastName ?lastName ;
       hr:employeeStatus "Active" .
  
  ?empPos a hr:EmployeePosition ;
          hr:employee ?emp ;
          hr:isCurrent true ;
          hr:position ?pos ;
          hr:department ?dept .
  
  ?pos hr:positionTitle ?posTitle .
  ?dept hr:departmentName ?deptName .
  
  ?sal a hr:Salary ;
       hr:employee ?emp ;
       hr:effectiveDate ?effDate .
  
  FILTER NOT EXISTS {
    ?sal hr:endDate ?endDate .
    FILTER(?endDate < NOW())
  }
  
  ?sal hr:salaryAmount ?salary .
  
  BIND(CONCAT(?firstName, " ", ?lastName) AS ?empName)
}
ORDER BY ?deptName ?posTitle

Query 2: Department Budget Analysis

sparql

PREFIX hr: <http://example.org/hr/ontology#>

SELECT ?deptName ?budget ?empCount ?avgSalary
WHERE {
  ?dept a hr:Department ;
        hr:departmentName ?deptName ;
        hr:budget ?budget .
  
  {
    SELECT ?dept (COUNT(?emp) AS ?empCount) (AVG(?salAmt) AS ?avgSalary)
    WHERE {
      ?empPos hr:department ?dept ;
              hr:employee ?emp ;
              hr:isCurrent true .
      
      ?sal hr:employee ?emp ;
           hr:salaryAmount ?salAmt .
      
      FILTER NOT EXISTS { ?sal hr:endDate ?end FILTER(?end < NOW()) }
    }
    GROUP BY ?dept
  }
}
ORDER BY DESC(?budget)

Query 3: Employee Benefits Enrollment

sparql

PREFIX hr: <http://example.org/hr/ontology#>

SELECT ?empName ?benName ?benType ?coverageLevel ?enrollDate
WHERE {
  ?emp a hr:Employee ;
       hr:firstName ?firstName ;
       hr:lastName ?lastName ;
       hr:employeeStatus "Active" .
  
  ?empBen hr:employee ?emp ;
          hr:benefit ?ben ;
          hr:isActive true ;
          hr:coverageLevel ?coverageLevel ;
          hr:enrollmentDate ?enrollDate .
  
  ?ben hr:benefitName ?benName ;
       hr:benefitType ?benType .
  
  BIND(CONCAT(?firstName, " ", ?lastName) AS ?empName)
}
ORDER BY ?empName ?benType

Query 4: Training Completion Analysis

sparql

PREFIX hr: <http://example.org/hr/ontology#>

SELECT ?courseName ?completed ?inProgress ?avgScore
WHERE {
  ?course a hr:TrainingCourse ;
          hr:courseName ?courseName ;
          hr:isActive true .
  
  {
    SELECT ?course 
           (COUNT(DISTINCT ?empTrain1) AS ?completed)
           (COUNT(DISTINCT ?empTrain2) AS ?inProgress)
           (AVG(?score) AS ?avgScore)
    WHERE {
      ?course a hr:TrainingCourse .
      
      OPTIONAL {
        ?empTrain1 hr:course ?course ;
                   hr:status "Completed" ;
                   hr:score ?score .
      }
      
      OPTIONAL {
        ?empTrain2 hr:course ?course ;
                   hr:status "In Progress" .
      }
    }
    GROUP BY ?course
  }
}
ORDER BY DESC(?completed)

Query 5: Performance Review Summary

sparql

PREFIX hr: <http://example.org/hr/ontology#>

SELECT ?empName ?reviewDate ?rating ?reviewer ?acknowledged
WHERE {
  ?review a hr:PerformanceReview ;
          hr:employee ?emp ;
          hr:reviewDate ?reviewDate ;
          hr:overallRating ?rating ;
          hr:reviewer ?reviewerEmp .
  
  OPTIONAL { ?review hr:employeeAcknowledged ?acknowledged }
  
  ?emp hr:firstName ?firstName ;
       hr:lastName ?lastName .
  
  ?reviewerEmp hr:firstName ?revFirstName ;
               hr:lastName ?revLastName .
  
  BIND(CONCAT(?firstName, " ", ?lastName) AS ?empName)
  BIND(CONCAT(?revFirstName, " ", ?revLastName) AS ?reviewer)
}
ORDER BY ?reviewDate DESC

✨ Conversion Highlights

Success Factors

Complete Coverage: All 11 tables converted with 100% data capture
Type Safety: Robust error handling prevents malformed data from failing conversion
Referential Integrity: All foreign keys preserved as proper RDF relationships
Schema Compliance: 99.1% SHACL compliance (19/2221 triples have minor pattern issues)
Scalability: Converter handles 210 records in < 2 seconds, scales linearly

TARQL Features Implemented

✅ IRI construction from primary keys
✅ XSD datatype casting
✅ Optional field handling (BIND IF BOUND pattern)
✅ Foreign key → IRI reference conversion
✅ Boolean value normalization
✅ NULL value filtering
✅ Enumeration handling (plain literals)
✅ Multi-table relationship preservation

🎯 Usage Recommendations

Loading into Triple Store

bash

# Apache Jena Fuseki
curl -X POST -H "Content-Type: text/turtle" \
     --data-binary @hr_database_complete.ttl \
     http://localhost:3030/hr/data

# GraphDB
# Via web UI: Import > RDF > Upload hr_database_complete.ttl

# Stardog
stardog data add hr hr_database_complete.ttl

Query Endpoint

Once loaded, data is accessible via SPARQL:

http://localhost:3030/hr/sparql  (Fuseki)
http://localhost:7200/repositories/hr  (GraphDB)

Validation Command

bash

pyshacl -s hr_database_shacl.ttl \
        -d hr_database_complete.ttl \
        -f human

Conclusion

Data integration could often take weeks or even months of work. AI can compress that to a matter of hours, but to do so, it needs structural information that lets it bridge between different systems and their intrinsic ontologies. That’s what SHACL has the potential to become - an abstract representation of both the structure and intent of an enterprise data system, useful for knowledge graphs but just as useful as a general tool for interchange.

In Media Res,

Kurt Cagle
The Ontologist

If you like these articles, please consider becoming a paid subscriber. It helps support me so that I can continue writing code, in-depth analyses, educational pieces, and more.

Check out my LinkedIn newsletter, The Cagle Report.

I am also currently seeking new projects or work opportunities. If anyone is looking for a CTO or Director-level AI/Ontologist, please get in touch with me through my Calendly:

If you want to shoot the breeze or have a cup of virtual coffee, I have a Calendly account at https://calendly.com/theCagleReport. I am available for consulting and full-time work as an ontologist, AI/Knowledge Graph guru, and coffee maker. Also, for those of you whom I have promised follow-up material, it’s coming; I’ve been dealing with health issues of late.

I’ve created a Ko-fi account for voluntary contributions, either one-time or ongoing, or you can subscribe directly to The Ontologist. If you value my articles, technical pieces, or general reflections on work in the 21st century, please consider contributing to support my work and allow me to continue writing.

Mathias Vanden Auweele

Mar 7

I understood from your article that you use SHACL generated from the DDL which is then used to generate TARQL scripts. After there is more SHACL to convert the result into the wanted knowledge graph vocabulary.

I'm very curious if you tried sparql-anything and if that would not be a more suitable approach compared to TARQL?

1 reply by Kurt Cagle

1 more comment...

The Ontologist

Discussion about this post

Ready for more?