Why do AI systems often fail over time instead of immediately?

Because many systems produce enough correct answers to feel useful while still carrying architectural weaknesses. Over time those weaknesses show up as incomplete retrieval, inconsistent data handling, hidden workflow decisions, or brittle behavior that erodes trust.

When is retrieval-augmented generation not enough for an AI system?

Basic retrieval is not enough when the task requires multi-step reasoning, comparing versions, combining multiple sources, or applying context over time. In those cases the system needs explicit decomposition or workflow logic rather than more chunks of context alone.

What causes AI hallucination in enterprise systems besides the model itself?

A common cause is inconsistent or poorly modeled input data. If sources conflict, authority is unclear, or freshness and ownership are not encoded, the model often produces smooth language from incoherent evidence rather than inventing facts from nowhere.

When should teams use agents versus deterministic workflows?

Agents fit problems where the path is genuinely unknown and exploration is required. Deterministic workflows are a better fit when the path is known and the real need is control, observability, and explicit decision points.

Can fine-tuning fix a broken AI architecture?

No. Fine-tuning can improve formatting, tone, and response behavior, but it does not fix structural issues like bad retrieval, conflicting data, or incorrect execution flow. Those problems need to be solved in the system design itself.

Your AI Architecture Isn’t Broken. It’s Just Put Together Wrong.

Where engineering leaders get their systems wrong

I’ve spent a fair amount of time over the past year looking at how teams are actually building with AI.

Not demos. Not prototypes. Real systems people are expected to rely on.

A pattern shows up pretty quickly.

These systems don’t break immediately. They break over time. They give just enough correct answers to feel useful, and just enough wrong ones to make people hesitate. You can watch the shift. At first people trust it. Then they double check. Eventually they work around it.

When that happens, the model gets blamed.

In most of the cases I’ve seen, it wasn’t the model. It was the design.

A few architectural patterns show up over and over. None of them are wrong. The problem is how they’re applied.

Part of why this keeps happening is subtle. These decisions don’t come from carelessness, but momentum. Teams reach for the pattern they saw most recently. The one that they read about or saw in a YouTube video. I’ve seen “enterprise-grade” systems assembled from ideas someone picked up in a video a few weeks ago. The system gets built before the problem is shaped.

Retrieval-augmented systems

This is where most teams start.

Take a set of documents, chunk them, embed them, retrieve the most relevant pieces at query time, and pass that into the model.

On paper, it makes sense. The model stays grounded. No retraining. It works for simple questions.

The boundary shows up when the questions stop being simple.

If a question can be answered by a single paragraph in your data, basic retrieval will work. If it requires combining sources, comparing versions, or applying context over time, it won’t.

Most retrieval systems aren’t built for that. They’re built for similarity.

So you get a familiar pattern:

a few chunks come back that look relevant
the model assembles something coherent
the answer sounds reasonable
but it’s incomplete, or slightly wrong

I’ve seen teams respond to this by pulling in more context. It feels like progress. It usually makes things worse.

At some point the model isn’t reasoning over context. It’s just summarizing noise.

The issue isn’t retrieval itself, but treating retrieval as if it solves reasoning.

If the problem requires multi-step understanding, the system has to reflect that.

Retrieval-augmented systems diagram

Most teams skip the decomposition step.

What they have isn’t a knowledge system. It’s a retrieval shortcut.

Data-fed systems without a data model

This one looks fine early on.

Documents get ingested. Embeddings get created. The pipeline runs. You can query it.

Underneath, the data is a mess.

The same concept exists in multiple places. Some versions are outdated. Some contradict each other. There’s no clear sense of which source is authoritative.

The model doesn’t know that.

It sees all of it and tries to produce a consistent answer.

That’s when you get responses that read well but don’t hold up. The language is clean. The reasoning is plausible. The facts don’t quite line up.

People call this hallucination. In most of the systems I’ve heard of, it wasn’t. It was inconsistent input.

A good data model does something simple but critical. It tells the system which version of a fact wins. It encodes authority, freshness, and ownership so retrieval returns a coherent view instead of a blend.

In practice, that usually means:

attaching metadata to every source (timestamp, owner, audience)
defining a precedence rule (which source wins when they conflict)
filtering retrieval based on permissions and context

Data-fed systems without a data model diagram

There’s no notion of source of truth, freshness, ownership, or permissions.

The model doesn’t resolve that. It smooths it over.

Teams treat this as a pipeline problem. In practice, it’s a modeling problem.

Until the system knows what is true, it will keep producing answers that are merely consistent.

Agent-based systems for structured problems

At some point, teams decide to make it an agent.

The appeal is obvious. Instead of a fixed flow, the system can decide what to do. It can call tools, plan steps, and adapt.

Useful in the right context, but overused in most of them.

I was looking at a system not long ago where an agent had been added on top of a retrieval layer that wasn’t returning the right information. The agent would retry queries, call different tools, and eventually land on something usable. It looked intelligent. What it was actually doing was compensating for a broken retrieval step.

You end up with a loop that hides problems instead of surfacing them.

Agent loop masking retrieval failures

When it works, it looks impressive. When it fails, it’s hard to diagnose. The same input can take a different path each time, and the reasoning is buried in intermediate steps.

For structured problems, the failure is predictable. You’ve replaced explicit decisions with implicit ones.

Deterministic workflow versus agent loop

This isn’t flexibility versus rigidity, but visibility. In a workflow you can see where decisions are made and change them. In an agent loop you infer it after the fact.

Agents make sense when the path is unknown and the system needs to explore. They’re a poor fit when the path is known but hasn’t been made explicit.

If an agent is consistently “recovering” from earlier steps, that’s usually a sign those steps need to be fixed, not wrapped.

Fine-tuned systems solving the wrong problem

This shows up later, once everything else is in place.

The system is close. Outputs are decent, but inconsistent. The format drifts. The tone isn’t quite right.

So the next step is to fine-tune.

That works when the problem is behavior, but not when the problem is structure.

If your system retrieves the wrong data, combines conflicting sources, or executes the wrong flow, training the model harder won’t fix it.

What it does instead is make the system more rigid.

Now you’ve encoded assumptions into the model. Updating behavior requires retraining. Debugging gets harder because the logic isn’t entirely in the system anymore.

You end up with something that looks more polished but is still wrong in the same places.

Fine-tuning is good at shaping how the model responds, but it doesn’t change what the system feeds into it.

A quick way to spot the mismatch

If your problem is…	Don’t reach for…	Instead, try…
Multi-step logic	More context/chunks	Query decomposition / chains
Conflicting data	A smarter model	A semantic data model / authority rules
Unreliable flows	An autonomous agent	A deterministic workflow
Bad formatting	RAG / prompting	Fine-tuning

The question that gets skipped

Before choosing a model, a tool, or a pattern, there’s a simpler question that rarely gets asked.

What kind of system is this?

That question isn’t about labels. It determines how the system behaves under pressure.

If it’s a lookup system, completeness matters less than precision. If it’s a reasoning system, retrieval has to be structured, not just relevant. If it’s a workflow, control and observability matter more than flexibility. If it’s exploratory, you accept variability and design around it.

Most of the failures above come from crossing those boundaries without noticing.

A lookup system asked to reason. A reasoning system built on inconsistent data. A workflow hidden inside an agent loop. A structural problem pushed into fine-tuning.

These aren’t dramatic mistakes. They’re small mismatches that compound as the system grows.

The model gets blamed because it’s the most visible part. The design set the constraints.

By the time you notice it, the system already has shape. And now you’re not fixing a model.

You’re unwinding a set of decisions you made before you understood the problem.

Where engineering leaders get their systems wrong

I’ve spent a fair amount of time over the past year looking at how teams are actually building with AI.

Not demos. Not prototypes. Real systems people are expected to rely on.

A pattern shows up pretty quickly.

When that happens, the model gets blamed.

In most of the cases I’ve seen, it wasn’t the model. It was the design.

A few architectural patterns show up over and over. None of them are wrong. The problem is how they’re applied.

Retrieval-augmented systems

This is where most teams start.

Take a set of documents, chunk them, embed them, retrieve the most relevant pieces at query time, and pass that into the model.

On paper, it makes sense. The model stays grounded. No retraining. It works for simple questions.

The boundary shows up when the questions stop being simple.

If a question can be answered by a single paragraph in your data, basic retrieval will work. If it requires combining sources, comparing versions, or applying context over time, it won’t.

Most retrieval systems aren’t built for that. They’re built for similarity.

So you get a familiar pattern:

a few chunks come back that look relevant
the model assembles something coherent
the answer sounds reasonable
but it’s incomplete, or slightly wrong

I’ve seen teams respond to this by pulling in more context. It feels like progress. It usually makes things worse.

At some point the model isn’t reasoning over context. It’s just summarizing noise.

The issue isn’t retrieval itself, but treating retrieval as if it solves reasoning.

If the problem requires multi-step understanding, the system has to reflect that.

Retrieval-augmented systems diagram

Most teams skip the decomposition step.

What they have isn’t a knowledge system. It’s a retrieval shortcut.

Data-fed systems without a data model

This one looks fine early on.

Documents get ingested. Embeddings get created. The pipeline runs. You can query it.

Underneath, the data is a mess.

The same concept exists in multiple places. Some versions are outdated. Some contradict each other. There’s no clear sense of which source is authoritative.

The model doesn’t know that.

It sees all of it and tries to produce a consistent answer.

That’s when you get responses that read well but don’t hold up. The language is clean. The reasoning is plausible. The facts don’t quite line up.

People call this hallucination. In most of the systems I’ve heard of, it wasn’t. It was inconsistent input.

In practice, that usually means:

attaching metadata to every source (timestamp, owner, audience)
defining a precedence rule (which source wins when they conflict)
filtering retrieval based on permissions and context

Data-fed systems without a data model diagram

There’s no notion of source of truth, freshness, ownership, or permissions.

The model doesn’t resolve that. It smooths it over.

Teams treat this as a pipeline problem. In practice, it’s a modeling problem.

Until the system knows what is true, it will keep producing answers that are merely consistent.

Agent-based systems for structured problems

At some point, teams decide to make it an agent.

The appeal is obvious. Instead of a fixed flow, the system can decide what to do. It can call tools, plan steps, and adapt.

Useful in the right context, but overused in most of them.

You end up with a loop that hides problems instead of surfacing them.

Agent loop masking retrieval failures

When it works, it looks impressive. When it fails, it’s hard to diagnose. The same input can take a different path each time, and the reasoning is buried in intermediate steps.

For structured problems, the failure is predictable. You’ve replaced explicit decisions with implicit ones.

Deterministic workflow versus agent loop

This isn’t flexibility versus rigidity, but visibility. In a workflow you can see where decisions are made and change them. In an agent loop you infer it after the fact.

Agents make sense when the path is unknown and the system needs to explore. They’re a poor fit when the path is known but hasn’t been made explicit.

If an agent is consistently “recovering” from earlier steps, that’s usually a sign those steps need to be fixed, not wrapped.

Fine-tuned systems solving the wrong problem

This shows up later, once everything else is in place.

The system is close. Outputs are decent, but inconsistent. The format drifts. The tone isn’t quite right.

So the next step is to fine-tune.

That works when the problem is behavior, but not when the problem is structure.

If your system retrieves the wrong data, combines conflicting sources, or executes the wrong flow, training the model harder won’t fix it.

What it does instead is make the system more rigid.

Now you’ve encoded assumptions into the model. Updating behavior requires retraining. Debugging gets harder because the logic isn’t entirely in the system anymore.

You end up with something that looks more polished but is still wrong in the same places.

Fine-tuning is good at shaping how the model responds, but it doesn’t change what the system feeds into it.

A quick way to spot the mismatch

If your problem is…	Don’t reach for…	Instead, try…
Multi-step logic	More context/chunks	Query decomposition / chains
Conflicting data	A smarter model	A semantic data model / authority rules
Unreliable flows	An autonomous agent	A deterministic workflow
Bad formatting	RAG / prompting	Fine-tuning

The question that gets skipped

Before choosing a model, a tool, or a pattern, there’s a simpler question that rarely gets asked.

What kind of system is this?

That question isn’t about labels. It determines how the system behaves under pressure.

Most of the failures above come from crossing those boundaries without noticing.

A lookup system asked to reason. A reasoning system built on inconsistent data. A workflow hidden inside an agent loop. A structural problem pushed into fine-tuning.

These aren’t dramatic mistakes. They’re small mismatches that compound as the system grows.

The model gets blamed because it’s the most visible part. The design set the constraints.

By the time you notice it, the system already has shape. And now you’re not fixing a model.

You’re unwinding a set of decisions you made before you understood the problem.

Your AI Architecture Isn’t Broken

It’s Just Put Together Wrong

Where engineering leaders get their systems wrong

Retrieval-augmented systems

Data-fed systems without a data model

Agent-based systems for structured problems

Fine-tuned systems solving the wrong problem

A quick way to spot the mismatch

The question that gets skipped

Related reading

Continue Reading

Where AI Systems Drift

Prompt Debt

Code Was Always a Liability. AI Is Letting It Grow Faster Than Ever.

Documents as Code

Your AI Architecture Isn’t Broken

It’s Just Put Together Wrong

Where engineering leaders get their systems wrong

Retrieval-augmented systems

Data-fed systems without a data model

Agent-based systems for structured problems

Fine-tuned systems solving the wrong problem

A quick way to spot the mismatch

The question that gets skipped

Related reading

Continue Reading

Where AI Systems Drift

Prompt Debt

Code Was Always a Liability. AI Is Letting It Grow Faster Than Ever.

Documents as Code