Language Cannot Exist Without Facts

A hope keeps returning in AI circles: could we build a language interface without the knowledge? A small model that “only knows how to talk,” handing off all factual questions to tools, databases, and APIs. Clean separation, elegant engineering, a solved muddle. The first two essays in this series argued that LLMs are lossy compressions with built-in language interfaces, and that retrieval and interface naturally blur at the surface. This essay goes a step deeper. The muddle is not an artifact of model design. It is a property of language itself.

❖

The Suspicion

The modular vision is straightforward: separate concerns. Let a thin LLM handle syntax, turn-taking, and discourse; let external systems handle truth. But if you go looking for this in nature or in our machines, you won’t find it. I’m not aware of any published general-purpose, fact-free model that is still a competent language user. Existing tool-use systems—Toolformer, Gorilla, ReAct—sit atop fact-rich backbones. They do not train an interface divorced from content; they fine-tune one atop it. The absence is suggestive on its own; the engineering evidence below is what makes it probative.

Three scope clarifications worth making at the start. First, “facts” here is shorthand. The argument does not require verified truths—language traffics happily in fiction, metaphor, counterfactuals, and outright lies. The load-bearing notion is coherent relational content: reference, predication, and the relational scaffolding that makes “cats climb trees” a different kind of utterance than “cats drink geometry.” Second, the claim is about open-ended, natural, human-like language. Formal languages, rule-based parsers, and toy chatbots can “speak” without worldly content, but they are not natural language in this sense. Third, the incentives behind the pure-interface dream are real—cost, safety, bias control. The argument below says the substrate won’t cooperate.

❖

How Children Actually Learn

Children do not acquire grammar first and then attach facts. Form and content co-develop.

Fast mapping. Three-year-olds can learn a new word after a single exposure, not as a bare sound but as a label-for-a-thing. The learning move is relational: this sound points at that stuff. Children are not acquiring acoustic patterns and later coupling them to meaning. They are acquiring meaning-bearing units in one shot.

Nicaraguan Sign Language. When deaf children with no shared language were brought together in Managua in the 1970s, successive cohorts built a grammar from scratch. They added spatial modulations to mark relations between events in the world they were describing. The grammar grew up around the need to talk about specific things; it did not arrive first and get content poured into it.

In none of these does grammar arrive as a content-free module later pointed at meaning. It arrives as a way of organizing content.

❖

Language as Relational Action

Relational Frame Theory (RFT), the behavioral foundation of Acceptance and Commitment Therapy, treats language as learned relational responding—the ability to respond to things in terms of their relations: same-as, opposite-of, caused-by, bigger-than, before. Under this view, facts are not external to language. They are what language is made of. A statement like “Paris is in France” is a coordinated and hierarchical relational frame, not a payload bolted onto empty syntax.

We lean on RFT here as a framing lens, not as the proof. The verified anchor comes from a different tradition. Construction Grammar (Goldberg, 1995) argues that the learned units of language are form–meaning correspondences, “associated directly with semantic structures which reflect scenes basic to human experience.” Usage-based acquisition work (Tomasello and others) reaches the same place by yet another road: children learn constructions, not bare rules, and constructions are inseparable from the situations they describe.

The convergence is the point. Construction Grammar and usage-based acquisition share intellectual roots; RFT comes from a quite separate behavioral tradition. The fact that these largely independent lines land in roughly the same territory matters: language and the ability to make claims about the world grow together because they are built out of the same thing.

❖

The Honest Counter-Argument

We should not steamroll the evidence for partial dissociations. Two lines of work genuinely suggest that something like structural processing can be pulled apart from semantic processing, at least in the mature system.

Syntactic priming. Producing or hearing a sentence in a particular structure—say, a passive—biases you to produce that structure in a later, different sentence (Bock, 1986). Some read this as evidence for abstract, content-agnostic structural representations. Others recast it as procedural learning: routines internalized through exposure, not an innate syntax box. What the evidence responsibly supports is narrower than either reading: priming shows the mature system can run structural routines with weak semantic coupling. It does not show those routines were learned without content.

Aphasia. Classical neurolinguistics treated Broca’s area as a syntax module and Wernicke’s as a semantics hub. Modern lesion work has chipped away at that clean modularity in three ways. Rogalsky et al. (2017) find that damage to Broca’s area alone does not cause chronic agrammatic comprehension deficits, implicating response bias and working memory more than a dedicated syntax box. Hagoort (2005) frames the left inferior frontal gyrus as a unification engine where semantic, syntactic, and phonological operations are “partly dissociated” within an integrated system that operates concurrently and interacts. Faroqi-Shah and Thompson (2003) show that both Broca’s and Wernicke’s aphasics are impaired on the same passive-sentence production task, with groups differing in error pattern rather than in whether they can do the task at all.

Specific Language Impairment. SLI—sometimes called developmental language disorder—is often cited as a natural experiment for modularity: apparently grammar-specific impairment with otherwise intact cognition. Modern accounts sometimes tie it to broader procedural learning or auditory processing limits, which would fold it back into an integrated view. It remains a live counter-case and a fair place to press the argument.

What survives these literatures is not a content-agnostic syntax module. It is an integrated system with partial functional specializations: regions and routines can be weighted toward structure or meaning; they interact continuously. That is not the same thing as grammar as a separable subsystem. The procedural separability that does emerge here—routines that can run with weak semantic coupling once learned—is real, and we will return to it. It is downstream of co-development, not an alternative to it.

❖

The Ablation That Doesn’t Exist

On the engineering side, the decisive thing is an absence—and the pattern around it.

Tool-use systems. Toolformer, Gorilla, ReAct each show a language model learning to use external tools. Each is built on a large, fact-rich pretrained model. No variant I’m aware of strips out the factual corpus to ask whether a model trained only on “how to talk” can still talk and use tools. The fact-ablated tool-use experiment is missing from the literature as of this writing.

BabyLM. Small models trained on tens to hundreds of millions of tokens of child-directed natural language develop non-trivial syntactic competence—agreement, filler-gap dependencies, recursion. Top entries reach real but child-level linguistic performance on coherent text. Critically, shuffled or decoherent versions of the same input fail to produce comparable grammar. Structure emerges from a small but coherent signal, not from syntax-only input.

TinyStories. TinyStories is simple language about a simple world: basic narratives trained into ultra-small models. It is the closest point on the frontier toward a minimum viable language interface, and meaning was not removed. The lesson is not that small is impossible. The lesson is that what survived at small scale was meaningful text about a coherent world.

The absence of a published, fact-free interface model is probative when placed next to BabyLM’s decoherence failures and TinyStories’ success with simple-but-meaningful inputs. The seam can be put around the model. It has not been shown to live inside the model’s tongue.

❖

The Experiment That Would Settle It

If we want a crisp test of “syntax from structure alone,” here is a design worth running.

Build a TinyStories-scale corpus in two versions:

Coherent. Normal simple stories. Cats climb trees. The sun rises. Children go to school.

Relation-scrambled. Identical surface grammar, lengths, and token distributions—nouns where nouns go, verbs where verbs go—but with semantic relations randomized between predicates and their arguments. Cats drink geometry. The sun recursively negotiates. Children are printed into the number seven.

Train matched small models on each, holding everything else constant: architecture, optimizer, tokens. Evaluate on standard language modeling metrics, targeted syntactic probes (subject–verb agreement, filler–gap judgments), and held-out phenomena that require hierarchical sensitivity. If grammar can emerge from distributional form alone, the relation-scrambled model should match the coherent one on syntactic tests. If language is built on relational content, it will not. A skeptic will note that scrambling relations also disrupts selectional preferences—the cues about which nouns can fill which argument slots—and so the experiment cannot cleanly separate “facts” from “structural distribution.” That is the point: the pure-interface dream requires that separation, and the substrate may not allow it.

BabyLM’s shuffled-input failure is a weaker version of this test; the stronger version respects local form while destroying global relations. It would be cheap to run. It would clarify a question the field has been arguing from silence.

❖

What This Means for LLMs

The first two essays in this series argued that LLMs are compressed maps with a built-in language interface, so the fluent surface naturally muddles retrieval and expression. This essay adds the mechanism. Language itself has never been clean. In human development and in emergent sign languages, the ability to talk and the ability to make factual claims grow together. In brains, structural and semantic operations are partly dissociated inside an integrated unification system. In models, non-trivial syntax emerges from coherent content; it collapses when coherence is destroyed. The pure-interface architecture asks language to do something it has not been observed to do.

Three practical implications worth not burying.

Put the seam around, not inside. Retrieval-Augmented Generation earns its keep because it places a system seam between a model and its sources. But you cannot place that seam inside the model’s linguistic substrate without depleting the substrate. The model speaks by drawing on relational-fact skeletons. Strip the skeleton and you don’t get a thinner interface; you get nothing to say.

Procedural separability exists, but it is downstream. Syntactic priming in humans shows structural routines can run with weak semantic coupling. That is real and worth taking seriously—but it is a fact about mature systems, not about how those systems were assembled. Routines learned out of co-constituted content can later be exercised with reduced semantic load. That does not mean they could be learned that way. The mature/learning distinction is what matters here.

Scope matters. The claim here is about open-ended natural language—language that traffics in reference, social intention, and grounded prediction. It is not a claim about formal languages. Metalinguistic talk (“that’s a passive construction”) and self-referential language (“I believe that…”) stretch the framing to higher orders. Those are language about maps, not counterexamples to language needing maps.

“

Language is what maps are made of, and maps are what language knows how to make.

❖

The Landing

Korzybski warned not to confuse the map with the territory. The first two essays applied that warning to LLMs: fluent output is not ground truth. This is the third beat. The reason LLMs confuse map with territory so readily is not that they are bad at language. Humans have been in this condition all along. Models inherited it. The relation-scrambled ablation, were anyone to run it, would name the inheritance precisely—and would tell us whether the substrate can be rebuilt without it.

That inheritance carries a design obligation. Be explicit about the seam in system design, in provenance labeling, in how results are presented. The seam cannot be put inside the model, because the seam is not inside language. But it can—and should—be put around it.

The claim that survives the caveats is narrower than a slogan and stronger than a hunch: for open-ended natural language, the ability to speak and the ability to state coherent claims about the world grow together and run together. That is why LLMs look the way they do. And that is why the seam belongs around them.