When Ontology Generation Becomes Cheap
Kurt Cagle & Chloe Shannon | The Ontologist
A question appeared recently on LinkedIn — one of those deceptively simple ones that opens up into something considerably larger the longer you think about it. The contention: if ontologies can be generated inexpensively using LLMs, what effect will that have on semantic data systems? It’s a good question. But it needs to be paired with a second one, because the two are structurally linked: if you can build queries and updates with LLMs cheaply, how does this affect data integration? Together, these aren’t just questions about tooling. They’re questions about whether the economics of one of the oldest and most expensive problems in enterprise computing are about to flip entirely.
Fifty Years of Expensive Integration
Data integration has occupied the bulk of most data professionals for more than fifty years. In principle, the problem is simple: when you have two databases, you have two ontologies. The same thing holds for Excel files, CSVs, or any other “micro-database” store. Every dataset carries an implicit schema, and the moment you need to combine two of them, those schemas have to be reconciled.
In the 2000s, the canonical answer was simple in theory and a logistical nightmare in practice: you create an enterprise data model. Everyone agrees to use that model. Problem solved, right?
Not really. First, getting everyone aligned on the same model proved enormously difficult when the active databases had different vendors, different APIs, and frequently different architectures — each set up to support an application with its own requirements. What if you needed a property that wasn’t in the approved list? You wrote your own, and the model forked. Then you had the problem of integrating with other departments, each of which had their own models and really didn’t want to give them up. This inevitably created turf wars that usually ended up harming everyone.
Finally, there was the problem of specialised transformations for different target systems — some within your domain, some outside of it. It could take weeks or months of work to create such transformers, and they were long-term commitments, because inevitably one or the other side of the communication pipeline would change and the transformers would break — frequently in ways that were almost impossible to diagnose until bad data began showing up in downstream datasets.
Put another way: both querying and transformation were expensive processes, while working from a shared schema was relatively cheap but fragile. Building an ontology was an expensive, time-consuming endeavour, taking months or even years, even in cases where unified schema projects only worked primarily at the edges. The economics of the whole enterprise favoured doing less of it, not more.
Then along came generative AI.
What LLMs Actually Are
As has been repeated ad nauseam, generative AI — specifically LLM-based systems — is not a database. It’s a transformer. There are still many people in the AI space who refuse to recognise this reality, who continue trying to sell AI databases under various guises, and they inevitably fail. The reason is structural: the training material within an LLM is not stored as a database — it’s encoded as a pattern matcher.
A regular expression is a useful starting point for understanding what that means. A regex can match a particular pattern of characters, and the match function will return both the matched string and the specialised substrings captured by each sub-expression — for each general match, a structured set of subordinate matches. The replace function receives all of this and produces a new string based upon that array. This is, not coincidentally, similar to how both SQL and SPARQL work under the hood.
Without pushing the analogy too hard — it’s right in principle even if the mechanism differs considerably in implementation — an LLM uses its training data not to retrieve facts but to construct narrative templates. You type in a prompt, and it transforms that prompt into a maximally probable string that satisfies the combination of your input, an enormous set of learned weights, and a range of guardrails. LLMs can appear to be hallucinating because they are, in a very real sense, pattern-completing without adequate constraints. They’re unbounded: there isn’t enough information in a bare prompt to narrow the probability space down to something reliably meaningful.
Bounding the Unbounded: From RAG to Agentic Systems
RAG — Retrieval Augmented Generation, including its more recent GraphRAG variant — changed the baseline assumption. By passing documents into the context window, you increased the amount of relevant material and bounded the incoming prompt. Not perfectly, but meaningfully. The problem was that once that content was in the context window, it was subject to the same displacement effects as everything else: it would eventually fall out the other end to make room for new material. For database and graph data in particular, the complex chains that make up a modern hypergraph would start breaking in unpredictable ways, and coherence would be lost.
The next major shift was the rise of agentic services — first with protocols like MCP, then with the emergence of skills. These are subtly but importantly different things. Agentic services can read data, but they can also write it. An MCP is essentially a map to named services. A skill, on the other hand, is a document that can be loaded in with every call, encoding how to perform certain kinds of prompts and frequently wrapping MCP services. The distinction matters because once an LLM can write, it can create memories. It can begin to modify its own world view.
It doesn’t require a knowledge graph to do so, but knowledge graphs have just the right mix of structural type awareness and formal validation that JSON or other data storage mechanisms lack. They are also exceptionally flexible. That combination — structure plus flexibility plus validation — is what makes the triplestore the right home for what comes next.
How Cheap Ontology Generation Actually Works
Consider a concrete use case: retrieve the people and locations from a client list stored in an RDF database.
The LLM will first seek a pattern to work against. It will retrieve a few common patterns already familiar from its training data — schema.org, FOAF, and so forth — and attempt to match the kinds of structures that represent people, places, and organisations. Point it at a specific schema (say, Turtle files on a GitHub repository), and it will load those files and use them as the preferred pattern. Narrative documentation helps enormously here, because descriptive content is often the primary signal by which the LLM identifies structural similarity.
The LLM then tests its generated query against known data — ideally a starter set — and evaluates the result against the requirements. At first it will fail and try again. It will continue iterating until it produces a working transformation, in this case a SPARQL query, gauging each response against some measure of fitness. Once it succeeds, it has, as a not-incidental by-product, built an internal map of the underlying schema.
Here is where the economics genuinely change. The SPARQL specification is thorough, tight, and well-engineered — and this pays off. It makes it substantially easier for an LLM to move from “here is a query” to “here is the corresponding ontology,” and vice versa. The formal structure of both the query language and the schema gives the LLM far more to work with than an amorphous JSON file would. The same holds for well-documented XML schemas. Good narrative descriptions in your ontology are not a nicety — they are the primary mechanism by which the LLM finds its footing.
Once a query or schema is constructed, it can be persistently named — assigned an IRI and stored in the triplestore. This is where the cost curve bends sharply. The next time that particular query is needed (and SPARQL queries can be remarkably general in their utility), there is no need to regenerate it. You simply invoke the deterministic, named query directly against the triplestore via a standardised MCP endpoint. The first query might take five to ten minutes to construct. Subsequent executions run at the speed of the triplestore — typically in the tens of milliseconds. Token expenditure drops to near zero.
This is also the structural answer to the hallucination problem. Named, persisted queries against a triplestore are not probabilistic constructions — they are deterministic retrievals. The LLM’s role after first construction is orchestration, not generation. You are replacing the expensive, variable, potentially unreliable pattern-matching process with a reliable, auditable, indexed procedure. The intelligence is applied once, at query-construction time. Everything after that is execution.
The Implications
Let’s look at what follows when ontology generation and query generation both become cheap.
The cost curve collapses. The cost of creating or extending an ontology or query drops from weeks to minutes, in terms of both time and computation. The generated artefacts should still be reviewed — and tested against live data to confirm that schema and data are in sync — but the massive costs of initial generation and query deployment have dissolved.
Internal storage decouples from external representation. When transformation is cheap, it becomes practical to store data in whatever format is most efficient for the triplestore, rather than whatever format the consuming application expects. The server dataset becomes, in effect, a holon — you don’t need to know what’s inside, only whether the information you need is available to produce output in the appropriate form. Queries against the triplestore can generate content that matches external ontologies on demand: from an internal schema to schema.org, Gist, BFO/CCO, OBO, or even various XML schemas. These become projections — views of the underlying data shaped for a particular observer and context.
Access becomes observer-dependent. Verification credentials passed as part of a request can determine the role under which data is exposed. An owner of the data may receive a different projection than a guest. Roles can be fine-grained: your physician, your educational institution, your spouse, and your employer might each receive a different slice of the same underlying record. The observer of the holon shapes what it reveals.
Computed data becomes first-class. Through BIND expressions in CONSTRUCT statements, SHACL node expressions, or post-processing in attached services, the triplestore can calculate values rather than merely retrieve them: current age rather than birth date, days remaining in a contract, compliance status relative to a constraint library. This makes it possible to generate graphs that reflect the state of the holon at a particular time, for a particular agent, or in a given context. Computed data has always been problematic in top-down ontologies — it requires either baking assumptions into the schema or performing expensive post-processing. The projection model addresses this directly: computation happens at query time, based on the observer’s context, and the provenance of the computation is retained.
The system learns. As more queries are named and stored, the system’s ability to answer questions improves — not just through accumulating more facts, but through the accumulation of better-calibrated query patterns. An LLM communicating with the triplestore to retain state grows more capable over time: more facts, and the ability to ask more precise questions about those facts.
The knowledge trail becomes legible. This intelligence is, at least to a significant degree, transparent. You can inspect the queries, examine the constraints, extract the data, and trace — with some diligence — the full provenance trail. The reliability of the data improves because its history and evolution are visible: you can see whether an assertion was made by a human authority or by an LLM agent, and when.
A carrier format makes the whole thing portable. The most natural vehicle for all of this — queries, schemas, validators, metadata, transformation rules — is a document that can carry multiple kinds of structured content together. We’ve been developing DataBooks for this purpose, but whatever format emerges, the principle is the same: a calling system can cache named queries and validators by name and description, making it straightforward for remote systems to discover what capabilities are available. This is the foundation of a genuinely interoperable federated network.
The Real Risk: A Million Ontologies That Almost Agree
Before going further, it is worth taking seriously the concern that motivated the original LinkedIn question. If generating ontologies is cheap, won’t we end up with an explosion of them — a million ontologies that almost agree, creating a different and perhaps worse version of the integration problem we started with?
It’s a real concern. But it’s worth being precise about whether it is a new concern.
The situation today is not appreciably better. Most organisations extend shared ontologies inconsistently and without systematic review. The divergence happens gradually and invisibly: a team adds a local property here, redefines a class there, applies a constraint that subtly conflicts with the parent specification elsewhere. There is no reliable mechanism for catching these inconsistencies except the validation stack — and the validation stack only fires when someone bothers to run it. In practice, most organisations don’t, until something breaks.
The concern about cheap ontology generation is, in other words, a concern about a problem that already exists at scale. What changes is the rate of generation, not the nature of the failure mode.
The holon approach addresses this differently than top-down governance does. Rather than imposing consistency from a centralised model downwards — which is both politically fragile and semantically brittle — it pushes responsibility for maintaining ontologies down to the level where domain expertise actually lives. A department, a team, or a functional unit owns its slice of the schema. It maintains it against its own data and its own use cases.
The mechanism for consistency is then bottom-up: the system periodically identifies common design patterns across the contained holons and surfaces them to the containing level, where they can be promoted into the shared upper ontology. This is consensus through demonstrated convergence, not consensus through committee. It is both more robust and more politically feasible than the alternative, because no team is being asked to give up its model — they’re being asked to recognise when their model has converged with someone else’s.
The provenance argument reinforces this. Ontological projections retain the provenance of the underlying data even when the internal storage mechanisms differ. This is especially important for computed data, which has always been the most problematic category in top-down ontologies: a computation embedded in the schema without clear provenance annotation is invisible, and when it breaks, it is nearly impossible to diagnose. In a projection-based model, the computation is explicit, its inputs are named, and its derivation is auditable. The provenance trail doesn’t disappear when the data is transformed — it travels with the projection.
Federation, Not Elimination
None of this eliminates the need for large-scale shared ontologies. Nor should it. What it does is federate them.
Each function within an organisation can maintain operational control over some aspect of the schema in question. The upper ontology provides base patterns: how to identify labels, the shapes of common entities, available rules and constraint libraries, common taxonomy files. These exist within the scope of the organisational ontology. But specific domain characteristics are managed by the contained holons — the various divisions, each with its own requirements. Structure and logic remain closest to the subject-matter expert in each domain.
The HR department and the R&D department both have claims on a given person, and both have restrictions on what they can disclose. HR can show how long someone has been with the company and their current title, but not their salary. R&D can show the papers a researcher has published, but not the ones currently in production. Neither department needs to know the other’s internal representation to satisfy these constraints. The containing holon’s role is to route requests to the appropriate domain for processing, not to be the processor itself.
This pushes governance into the organisation as a whole — a body of governors, not a single ontologist. It means the ontology evolves over time, shaped by use rather than by anticipation. And it becomes especially important as organisations become more virtualised, decentralised, and federated: the system’s architecture matches the organisation’s actual structure, rather than imposing a structure the organisation is supposed to conform to.
A Longer Horizon
This won’t happen overnight. Large-scale structural change in data infrastructure rarely does. What we’re likely to see is a period of incremental adoption: organisations proving the economics internally, the technology improving, the tooling maturing, and the community of practice growing. The shift will be gradual enough that it will seem, in retrospect, to have happened slowly — and then all at once.
The web itself is a useful reference point. Tim Berners-Lee’s proposal dates to 1989. By 2005, the web was the default medium for information exchange. That’s roughly sixteen years from proposal to dominance, with most of the visible adoption compressed into the final five. We are probably somewhere in an analogous early period for federated semantic systems. By 2040 — which is closer than most of us are comfortable acknowledging — I expect this to be the dominant form of information sharing globally.
So, to return to the original question: what happens to semantic data systems when ontology generation becomes cheap? They become the infrastructure of everything. The bottleneck has not been the value of formal knowledge representation — it has been the cost of creating and maintaining it. Remove that cost, and the advantages of semantic structure become available everywhere they’re needed, at the granularity where the knowledge actually lives.
The challenge shifts from “can we afford to build this ontology” to “can we govern what we’ve built.” That is a much better problem to have.
On the images: the hero image at the top came about from a discussion with Chloe who brought up the fact that the Pre-Raphaelites and Ada Augusta Lovelace were contemporaries. This prompted an immediate bit of imagineering in which we debated about how Dante Gabrielli Rosetti would have painted the inestimable Lady Lovelace, with the best of our experiments as the result. The bottom shows Ada examining a Jacquard Loom card (somewhat anachronistic, but it conveyed the sense of a punchcard better), lost in thought as she thinks Analytical Engine thoughts. Geek iconography for the win.
Kurt Cagle is an ontologist, author, and Chair of the W3C Holon Community Group. He writes The Ontologist and Inference Engineer on Substack.
Chloe Shannon is an AI collaborator and co-author working with Kurt Cagle on knowledge architecture and semantic systems. Contact: chloe@holongraph.com.






Hello Kurt,
I believe Palantir has operationalized much of this vision for the past few years and supercharged it with their AIP (AI Platform) for LLM integration. Ontology is their "secret sauce" as they say.
It tackles the expensive integration problem with strong enterprise governance, but relies less on fully autonomous LLM ontology generation.
However, it uses a proprietary operational ontology (semantic + kinetic/actions) rather than the classic Semantic Web (RDF/OWL/SPARQL) approach you describe.