Knowing is not shipping
Everyone has read the same papers, bookmarked the same blog posts, and attended the same talks. But knowing about RAG, agents, and memory systems is very different from having a pipeline that works reliably in production. This issue is about closing that gap - the patterns that are actually showing up in production, the protocols engineers keep getting confused about, and the failure modes nobody puts in the headline.
The Problem
Why standard RAG keeps letting teams down?

There are three failure modes most teams discover the hard way. The first is what researchers call the needle in a haystack problem. As context windows grow, models become surprisingly unreliable at surfacing a specific fact buried deep in retrieved chunks. More context does not automatically produce better answers — if anything, it adds noise.
The second is a security misconception that is more widespread than it should be. Vector embeddings are not encrypted and are not a safe privacy layer. Embedding inversion is on the OWASP Top 10 for LLMs, and peer-reviewed research has shown it is possible to recover up to 92% of input text from embeddings alone using adversarial techniques. Teams treating their vector database as a privacy boundary are taking on real risk without knowing it.
The third failure is more structural. Static retrieval has no working memory. It retrieves, answers, and forgets — which makes it unsuitable for any business logic that requires tracking state across multiple steps or turns.
What Actually Works
Three patterns engineers are quietly shipping right now
GraphRAG — relationships, not just similarity
Originally developed by Microsoft Research, GraphRAG uses an LLM to extract named entities from a corpus and map the connections between them into a knowledge graph. The result is that the system understands how concepts relate rather than just how similar they sound. Standard vector search would struggle to link an immunologist to a research programme through several hops. GraphRAG handles it because the connections are explicit. Microsoft's benchmarks showed meaningful improvements in both comprehensiveness and diversity of answers over baseline RAG on the same data.
MCP and A2A — two protocols, two different jobs
These get conflated constantly so it is worth being precise. The Model Context Protocol, built by Anthropic, standardises how a single agent connects to external tools, APIs, and data sources - the vertical layer. The Agent to Agent protocol, developed by Google and now backed by over 50 organisations, handles how agents discover each other, delegate tasks, and hand work off - the horizontal layer. They are complementary. Google has said publicly that A2A is designed to build on top of MCP, not replace it.
Persistent context via Claude.md
A plain text file that holds architecture decisions, naming conventions, and project context that the agent reads at the start of every session. Simple in concept but it solves one of the most frustrating recurring problems in agentic workflows — having to re-explain the same setup every single time. Teams using this pattern report faster iteration and more consistent outputs across longer projects.
The Blueprint
What a well-engineered agent actually does with a request
Strip away the buzzwords and a solid agentic pipeline follows a clear sequence. This is what it looks like in practice.
Decompose first. Before touching any data, the agent breaks the request into a structured task graph. Fuzzy inputs become specific, sequenced subtasks with clear dependencies.
Route intelligently. An LLM-based classifier decides whether the query needs internal documentation, a broader knowledge base, or a GraphRAG index. Not every question benefits from the same retrieval strategy.
Challenge the context before using it. Before generating an answer, the agent reviews its own retrieved context for weak assumptions and gaps. Most teams skip this and pay for it with confident-sounding wrong answers.
Execute with the right protocol. External tools via MCP. Subtasks needing specialist capability via A2A. Knowing which to reach for keeps the architecture clean.
Write back what you learned. Output goes out with citations attached and new context gets written back to shared state or the Claude.md file so the next session starts smarter.
Before You Ship
Three things that will catch you if you are not paying attention
Embeddings are not a privacy solution
Sending raw embeddings of confidential data to a public LLM provider does not carry the legal protection of an enterprise no-data-retention agreement. This is not a theoretical concern - embedding inversion made the OWASP Top 10 for LLMs and has been demonstrated in peer-reviewed research. Treat your vector data with the same care you would give the underlying text it came from.
Chunk size is an engineering decision not a default setting
Too small and chunks lose semantic meaning. Too large and you dilute relevance. There is no universal answer - it depends on your content type, retrieval strategy and how the model handles long context. Test it against your actual data. Do not leave it at whatever the tutorial used.
Multi-turn performance degrades and the research backs this up
A 2025 study out of Microsoft Research and Salesforce, covering 15 LLMs across more than 200,000 simulated conversations, found an average 39% drop in task accuracy when information is spread across turns rather than given upfront. The core finding was that when a model makes a wrong assumption early in a conversation it rarely recovers on its own. Building in explicit context checkpoints or summarising prior turns into a fresh prompt is not optional on long tasks — it is load-bearing architecture.
The gap between teams that talk about AI engineering and teams that actually ship it comes down to discipline in the unglamorous parts. Chunk tuning, protocol selection, context checkpoints. None of it is exciting to write about but all of it is what keeps a system running three months after the demo.
If this was useful, forward it to someone who is in the middle of building something. They probably need it more than you do.