Your AI Agent Is Only as Good as Its Context

When an AI agent gives a wrong answer, most teams blame the model. They switch from GPT-4o to Claude. They try a different temperature setting. They add “think step by step” to the prompt.

The model is rarely the problem.

In the vast majority of cases, the agent failed because it did not have the right information at the right time, in the right form. It is a context problem, not a model problem.

This matters because model selection is something vendors compete on and users debate endlessly. Context design is something practitioners rarely talk about — and it is where most of the real leverage is.

What Context Actually Means

Context is everything the model knows at the moment it is asked to act.

In a simple chatbot, context is the conversation history plus whatever you put in the system prompt. In an agentic system — one that takes actions, uses tools, and operates across multiple steps — context is substantially more complex.

There are four types of context that agents need:

1. Working memory. The current task, the current state, and what has happened so far in this run. This is the conversation thread, the agent’s accumulated observations, and the outputs from previous steps. It lives in the context window.

2. Episodic memory. What has happened in past runs. What did this agent do last time it handled a similar situation? What has changed since then? Without episodic memory, your agent starts fresh every time it runs — which is fine for stateless tasks and fatal for ongoing workflows.

3. Semantic memory. The knowledge base the agent draws on to reason. Your company’s data definitions. Your measurement methodology. Your product catalog. The rules that govern how decisions get made. This is usually retrieved at runtime rather than embedded in the prompt.

4. Procedural memory. How to do things. Which tools to use, in what order, with what parameters. This is often baked into the system prompt or encoded in the orchestration layer as explicit steps.

Most teams only think about working memory — what’s in the prompt right now. The agents that work reliably in production have all four layers designed intentionally.

The Most Common Context Mistakes

Dumping everything into the system prompt. A 10,000-token system prompt with your entire knowledge base, all the rules, all the edge cases, and detailed instructions for every possible situation is not context design. It is a failure to design. Models do not process long system prompts the way humans read documentation. Relevant information needs to be retrieved at the moment it is needed, not preloaded in full.

No retrieval strategy. Retrieval-augmented generation (RAG) is the mechanism for giving agents access to information that does not fit in the context window. Most teams implement it once and assume it works. A retrieval that returns the top-5 most semantically similar chunks regardless of recency, structure, or relevance to the actual task is almost always worse than it could be.

State that doesn’t persist. An agent that runs every morning should know what it found yesterday. An agent that is handling a multi-step research task should know what it has already investigated. If you are not explicitly designing how state persists between runs, it does not persist.

Context contamination. Every tool result, intermediate output, and observation that gets appended to the context window reduces the model’s effective attention on what matters. Long agent threads with many accumulated observations often degrade in quality because the relevant early context gets crowded out.

Designing Context for Marketing and Data Workflows

In marketing and data contexts, context design has a specific shape.

Data freshness matters. An agent reasoning about campaign performance needs to know when the data it is looking at was last updated. Stale data without timestamps leads to confident wrong conclusions.

Definitions are contested. “Conversion,” “new customer,” “incremental revenue” — these terms mean different things in different systems, for different teams, in different platforms. The agent needs access to the canonical definitions used by your organization, not its training data’s interpretation of these words.

The right grain of information is not always the document. RAG systems that retrieve entire reports or long documents often give the model too much noise and not enough signal. For analytical tasks, structured data — a table, a set of metrics, a specific time range — often outperforms a semantic search over narrative text.

Provenance should be in context. When an agent is drawing on data to make a recommendation, the model should know where that data came from and how reliable it is. “Meta ROAS from the platform API” and “Meta ROAS from the modeled attribution layer” are different numbers that imply different levels of trust.

A Framework for Context Design

Before building an agent, answer these questions:

What does this agent need to know to complete its task? List every piece of information required. Be specific.
Where does each piece of information live? Source system, database, API, document store, or generated from a previous step.
When should it be retrieved? At the start of each run? On demand when the agent needs it? Once and cached? The answer depends on how frequently it changes and how critical freshness is.
How should it be formatted? Structured JSON, a markdown table, a numbered list, or prose? For reasoning tasks, structured formats often outperform unstructured ones. Test this explicitly.
What should not be in context? Every piece of information in the context window costs tokens and attention. Be ruthless about what is necessary versus what is noise.

The Practical Implication

The best model with bad context will produce worse outputs than a good model with excellent context. This is consistently true in practice.

Before you upgrade your model, improve your context. Before you add more instructions to the system prompt, ask whether the instructions are compensating for information the agent simply does not have. Before you assume the model is hallucinating, check whether the information it hallucinated was something it should have been given.