Engineering Philosophy:
A Synthesis

Marcus Persson Rydberg · March 2026 · ↓ PDF

We used to live in a world where code was expensive. You needed time, patience, and a person willing to sit there and type the thing. Now you can get a thousand lines before your coffee cools. The bottleneck moved.

The scarce thing is no longer code. The scarce thing is judgment.

That shift changes the whole job. Engineering leadership is no longer about squeezing more output out of the keyboard side of the system. It is about building technical and human systems that can turn abundant generation into trustworthy software. The question is not "how do we get more code written?" It is "how do we know what should exist, what should not, and whether any of this is actually helping?" In the AI era, correctness, alignment, and interpretation are the hard parts.

I think this is good news, actually. It means the heart of engineering is becoming visible again. The work was never really typing. It was always deciding, testing, separating, verifying, and learning. AI just removed our ability to pretend otherwise.

Software Is Discovered, Not Specified

Most software work is not the execution of a perfect blueprint. It is discovery under uncertainty. A feature request is a guess dressed up as a sentence. A design doc is a hypothesis with bullet points. A sprint commitment is a polite public estimate made before reality has had a chance to object.

So I start from a simple rule: every design decision is a hypothesis, every code change is an experiment, and every test is an instrument. Not a safety net. Not a bureaucratic checkbox. An instrument. It tells you something about reality.

On a Tuesday morning, someone says, "we just need an export button." Fine. Two hours later you discover "export" means CSV, but only for approved records, but it must preserve the current filter, but it must not include personally sensitive fields unless the user has a particular permission, and yes, finance will open it in Excel on Windows and complain if the commas are wrong. Nothing about this means the team failed to gather requirements. This is what requirements look like in the real world. They unfold.

That is why I care so much about tests and types. They are not there because I worship process. They are there because they let me learn cheaply. A failing test tells me my hypothesis was wrong while the change is still small. A type error tells me my mental model does not match the system. That is good news. It is much cheaper for the compiler to embarrass me than for production to do it in front of customers.

This matters even more when code generation is abundant. If a model can produce ten possible implementations in the time it used to take me to write one, generation has ceased to be the scarce step. The scarce steps are now: what exactly do I need, and how will I know if I got it? The best teams are not the ones who produce the most code. They are the ones with the cheapest, fastest path from guess to evidence.

Dave Farley has been insisting on this for years, and he is right: if you cannot test your ideas quickly, you are not engineering so much as hoping with syntax. Hope is not a method.

The Enemy Is Complexity, Not Difficulty

The enemy is not difficulty. Hard problems are fine. Hard problems are why we get paid. The enemy is complexity: the unnecessary entanglement of concerns that should have remained separate.

A handler that validates input, checks permissions, loads three services, mutates state, writes to the database, emits analytics, formats an email, and decides retry policy is not impressive. It is a hostage situation. You cannot change any part without negotiating with the whole.

Rich Hickey gave this a perfect word: complecting. We braid together things that should stay separate, then act surprised when the braid becomes hard to reason about. Under normal conditions this is bad. Under AI conditions it is lethal, because code volume can now scale faster than human attention.

Last quarter a team I know inherited a "platform" service that had grown to forty thousand lines of beautifully AI-generated code. It did everything: authentication, business rules, caching, background jobs, analytics, three different notification strategies, all living inside the same eight-hundred-line god functions. The AI had been extremely productive. The humans were drowning. They spent three weeks extracting concerns — not because they hated the code, but because they needed to reclaim the ability to make a change without holding the entire universe in working memory. The new version has fewer features and more tests. It also ships faster, breaks less, and doesn't require a shaman to understand.

That is the practical problem of the era. We are entering a high-throughput, low-attention environment. You cannot read everything. You cannot manually reason through every generated path. So the design goal is not "make it possible to understand the whole system in detail every time." The design goal is "make it unnecessary."

Separation of concerns, clear interfaces, cohesive modules, information hiding, narrow dependencies, obvious data flow: these are not aesthetic preferences. They are compression strategies for human judgment. A good boundary saves attention. A bad boundary consumes it forever.

This changes what review is for. If the compiler and test suite can give you strong evidence that a change behaves correctly, the review should focus less on line-by-line execution and more on design. Is this concern in the right place? Is the dependency necessary? Is the abstraction buying us something real, or are we making a speculative down payment on a future that may never arrive?

The systems that survive the AI era will not be the ones that generate the most code. They will be the ones that can afford to distrust most of it.

Short Loops Or Nothing

Feedback delay is where good intentions go to die.

If I make a change and learn whether it worked three minutes later, I can still remember what I was thinking. If I learn three days later, I have already context-switched twelve times, half the team has moved on, and the bug report now feels like archaeology. Long loops do not just slow learning. They distort it.

On a Tuesday at 2:40 p.m., someone changes a billing rule. In one organization, that change waits for a manual test cycle, gets bundled into a Friday release train, and shows up in customer support on Monday. In another, the change is behind a flag, the tests run in minutes, it ships to a thin slice of traffic, the dashboard lights up, and by 3:10 p.m. the team knows whether the hypothesis was sound. Those are not two versions of the same discipline. They are different physics.

This is why continuous delivery matters. Not because deployment frequency is a trendy badge. Short loops are how systems stay stable. If a process hurts, the right answer is usually to do it more often until the pain becomes impossible to ignore and therefore impossible not to fix. Rare deployments stay painful because everyone can keep pretending the pain is normal. Frequent deployments force honesty.

The same rule applies everywhere. A forty-minute test suite teaches people not to wait for the result. A quarterly incident review severs behavior from consequence. A six-month performance cycle makes course correction mostly decorative. Delay produces drift, overshoot, and ritualized confusion.

So I want loops to be short by default: fast tests, small batches, continuous integration, deployable main, visible observability, immediate rollback paths, post-mortems while memory is still warm. I do not want courage-based delivery, where the release depends on a senior engineer breathing into a paper bag at 5 p.m. on Friday.

The slogan version is simple: if reality is late, reality loses.

You Bleed When It Breaks

The people who make decisions should feel the consequences of those decisions. Not in a punitive way. In a structural way.

If the team that designs a service never has to run it, the system will drift toward elegant diagrams and ugly nights. If the team that ships the feature also gets paged when it misbehaves, the learning loop closes. Bad observability becomes painful. Missing runbooks become painful. Risky migrations become painful. Pain is information, and operational ownership routes that information to the people most capable of acting on it.

On a Tuesday evening, an engineer ships a perfectly reasonable retry policy. At 3:17 a.m. it turns a transient partner outage into a self-inflicted denial of service. The next day, that engineer does not need a lecture on resilience. The system has already provided one. The learning is immediate, embodied, and hard to ignore.

Taleb's language for this is still the cleanest: skin in the game. When decision and consequence stay connected, systems correct themselves. When they separate, fragility grows in the gap.

This is also why small, frequent deploys are safer than rare, dramatic ones. A system that is exercised every day adapts to stress. A system protected from stress becomes brittle. The release process itself is a muscle. Use it or watch it atrophy.

There is a second rule hiding inside this one: subtraction is usually more reliable than addition. Every dependency is another failure mode. Every abstraction is a bet on a future shape of the world. Every new moving part is one more thing that can wake somebody up at night. When a team is debating how to solve an edge case, one path usually involves adding a worker, a cache, and a dead-letter workflow. The other path is deleting a clever optimization nobody trusts and making the operation idempotent. I almost always want the deletion path first. Remove, simplify, collapse, narrow, inline, decouple. Addition feels productive. Subtraction is often wiser.

The Metric Always Lies

The moment a metric becomes high-stakes, it begins its journey from measure to target to parody.

Velocity targets create story-point theater. Coverage targets create tests that assert the constructor can be constructed. "Response time" targets create shallow ticket closures and deep customer resentment. "Always-on" cultures emerge because one ambitious person's midnight message quietly becomes everyone else's implied expectation.

I do not think this happens because people are evil. It happens because systems optimize what they are asked to optimize, and humans are adaptive. If the dashboard decides status, status will be optimized for the dashboard. On a Tuesday in planning, one team estimates honestly and another estimates strategically. By Thursday, the honest team looks slow. By next Tuesday, they have learned the lesson the system actually teaches. Not "be truthful." Survive.

This is why leadership cannot be outsourced to metrics. Metrics are useful. I like dashboards. I enjoy a tidy graph as much as the next nervous technologist. But a metric can only tell me what was easy to count. It cannot tell me what counted.

Culture Is Just Structured Gossip

Culture does not spread by policy memo. It spreads by imitation. New engineers do not learn the team from the values slide. They learn it by watching what gets praised, what gets ignored, what gets forgiven, what gets escalated, who asks good questions in review, who writes the incident note, who admits uncertainty without being punished.

Pairing works because people absorb judgment in motion. Retros work when they become a place where the real story can be told. Standups work when they synchronize dependency and not ego. One-on-ones work when they are a safe route for weak signals and inconvenient truths. Even gossip has a function — it is the distributed reputation system of the organization, carrying coordination data the org chart cannot.

If you want norms to improve, make good behavior visible and imitable. Stop trying to manage culture by slogan and metric alone. People believe the system you reward, not the system you describe.

The Interpretation Layer Is The Job

Every engineering organization has two layers.

The first is the optimization layer: code, KPIs, test counts, model outputs, dashboards, deployment stats, latency graphs. This layer produces measurable outputs. Necessary — without it, you are flying blind.

The second is the interpretation layer: human judgment, conversation, narrative, code review, retrospectives, one-on-ones, architecture debate, incident analysis. All the places where people reconstruct what the numbers mean. Also necessary — without it, you are flying instrument-only through a storm with half the gauges mislabeled.

Healthy systems have both. Unhealthy ones overdevelop the first and starve the second.

On a Tuesday morning, the dashboard says incident volume is down and deployment frequency is up. Wonderful. Then a senior engineer mentions in a one-on-one that the team has quietly stopped touching the payment service because nobody trusts the release path anymore. The metrics describe motion. The conversation reveals fear. The dashboard was not wrong. It was incomplete.

This matters more when AI enters the room. Model outputs look authoritative. They are legible, confident, and cheap. That makes interpretation more important, not less. A plausible answer is not a verified answer. A generated diff is not a design decision. Ten agreeing benchmark numbers do not explain whether the system is now easier to change or merely harder to question.

The leader's job is to keep the interpretation layer alive. Code reviews should ask whether the design is sound, not just whether the syntax is legal. Retros should update mental models, not assign ceremonial blame. Post-mortems should produce understanding, not theater. One-on-ones should surface what metrics systematically miss.

If you optimize without interpreting, you do not get a high-performance organization. You get a very efficient drift into nonsense.

Management Is Leverage, Not Theater

A manager's output is the output of the team. Andy Grove said it plainly, and there is still no better definition.

That means management is not a side activity you perform once the "real work" is done. Done well, it is leverage. Hire well and the effect compounds for years. Train people and capacity grows without you. Remove a blocker and five people move. Clarify a goal and a week of confused effort disappears. Protect deep work and hard problems actually get solved.

On a Tuesday, a CTO can spend an hour in a status meeting, or they can spend that hour unblocking access to a staging environment that has been burning engineering time for a month. One of those actions produces the pleasant feeling of managerial activity. The other produces leverage.

Methodologies are tools, not religions. Scrum, Kanban, XP, continuous delivery, whatever else we like to capitalize: all useful when they solve a real problem, all ridiculous when they survive mainly as ceremony. A standup that surfaces dependencies is helpful. A standup that exists to prove aliveness is office-themed cardio. A retro that changes behavior is valuable. A retro that generates a shared sense of virtue and no action is scented-candle management.

The highest-leverage changes are usually not parameter tweaks. Changing the KPI rarely fixes the underlying system. Changing the headcount plan rarely fixes a broken mental model. Real change happens when the team starts seeing the work differently. If a team believes tests slow them down, they will find a hundred ways around your quality process. If they understand that tests are what let them move quickly without rediscovering the same bugs every week, the system changes from the inside.

That is leadership in the AI era. Not command and control. Not output worship. Not ritual management. The job is to shape the goals, the loops, the norms, and the mental models so the organization can make good judgments at scale.

The Post-It Version

Code is cheap now. Judgment is not.

So I want software built like science: form a hypothesis, run the experiment, read the instrument, update the model. I want systems designed to conserve attention: separate concerns, keep boundaries clear, delete more than you add. I want loops so short that reality can still be heard before the org invents a story about it. I want ownership tied to consequence, because that is how systems learn. I want leaders who maintain the interpretation layer, because metrics alone will always drift toward theater. I want management treated as leverage, because that is what it is.

The old bottleneck was getting code written. The new bottleneck is deciding what deserves to exist and proving that it behaves. That is not a small adjustment. That is the whole game.

Build systems that help humans notice truth early. Everything else is decoration.

Want to discuss any of this? hello@artilect.us