Gartner's 2025 forecast landed with little fanfare in most engineering Slack channels, but it deserves a second look: roughly 40% of agentic AI projects initiated between now and 2027 will fail to reach production or will be retired within twelve months of launch. That figure is not a referendum on large language models. It is a diagnosis of organizational infrastructure and the gulf between where AI agents want to live and where enterprise data actually resides.
What Gartner's Prediction Actually Means
When analysts talk about project failure in this context, they are not describing dramatic system crashes or hallucinating agents that propose catastrophically wrong actions. The more common failure mode is slower and quieter: a promising proof-of-concept that works beautifully in a sandboxed environment, then stalls indefinitely when engineering teams try to connect it to the systems where actual work happens.
The pattern is consistent across industries. A bank's intelligent document processing agent performs flawlessly against a curated test dataset of PDFs. When it encounters the real loan origination system—a COBOL-era core banking platform last meaningfully updated in 2003—it has no reliable path to read, write, or reason about the data it needs. The project gets deprioritized. A new initiative launches six months later.
The Three Layers of the Legacy Integration Problem
| Layer | Problem | What Agents Expect | What Legacy Systems Offer |
|---|---|---|---|
| Data format | Format incompatibility | Clean JSON interfaces | WSDL, flat files, XML with 2001-era namespaces |
| Auth & authorization | Access control mismatch | Dynamic, fine-grained permissions | Fixed role-based access for human operators |
| Reliability & latency | Timing assumptions | Async calls returning in reasonable timeframes | Timeouts, stale caches, unpredictable load |
Layer 1: Data format incompatibility
Enterprise systems of record were designed to store and retrieve structured data in ways that made sense for human operators working through defined workflows. They were not designed to expose semantically rich context to a reasoning model. Proprietary binary formats, undocumented schema variations between deployment versions, and data that only carries meaning in the context of adjacent fields—these are not edge cases. They are the norm in systems that have accumulated decades of customization.
Agents built on modern tool-calling frameworks expect clean JSON interfaces. The real world offers WSDL services, flat file exports, and RPC calls that return XML with namespace prefixes from 2001.
Layer 2: Authentication and authorization surface area
Modern agentic systems need to act, not just read. An agent that can retrieve information but cannot execute actions has limited utility for the autonomous workflows that justify the investment. But enterprise systems enforce access control in ways that were designed for humans with fixed job roles—not for dynamic agents that may need to operate across a dozen systems in a single task execution.
Service account proliferation, OAuth scope limitations on older platforms, and the absence of fine-grained audit trails for agent actions all create friction that security teams are rightly reluctant to wave away. Many projects fail here: not because the agent cannot do the work, but because the organization cannot safely grant it permission to try.
Layer 3: Reliability and latency assumptions
Modern agent frameworks—including the Claude Agent SDK that powers platforms like Neumar—are built around async execution patterns that expect tool calls to return within reasonable timeframes. Legacy systems, especially those running on on-premises infrastructure with unpredictable load profiles, do not always cooperate. A tool call that times out after thirty seconds, then succeeds on retry, then returns a stale cached result, creates reasoning errors that compound across multi-step agent workflows in ways that are difficult to diagnose.
Why Teams Consistently Underestimate This
There are three structural reasons why the legacy integration gap catches teams off-guard repeatedly.
The demo environment is always cleaner than production. Proof-of-concept work happens against sanitized datasets, mock APIs, or the newest version of a platform's REST interface. The agent looks remarkable. The gap to production is invisible until someone tries to cross it.
Integration complexity is not a model problem, so it falls outside the AI team's mental model. AI engineers are rightly focused on prompt design, tool definition, and evaluation. The integration layer feels like plumbing—someone else's problem. But the "someone else" is often a platform team that is already backlogged, does not own the agent initiative, and has competing priorities.
ROI calculations ignore integration cost. When a vendor demonstrates a 70% reduction in processing time, that figure is measured from the point where the agent receives clean, accessible data. The cost of getting to that point—middleware development, API wrappers, data normalization pipelines—frequently exceeds the original project budget and is not in the initial business case.
What Succeeds: The Integration-First Approach
The projects that do make it to production share a common trait: they treat integration as the primary engineering challenge, not as a follow-on task. Concretely, this means:
Starting with a systems audit rather than a capability demonstration. Before writing a single line of agent code, map every system the agent will need to touch, document its current interface (including its current deficiencies), and get explicit alignment from the teams who own those systems on what is and is not possible.
Designing for graceful degradation. Agents that fail loudly when a legacy system is unavailable are more dangerous than agents that acknowledge their limitations and escalate to human review. The tool layer should be designed with explicit fallback behaviors, not optimistic assumptions about system availability.
Using a platform with workspace isolation built in. One reason Neumar's architecture invests in workspace isolation and OS-level sandboxing is that enterprise deployments genuinely need agents that cannot inadvertently reach systems they were not explicitly configured to access. This is not a nice-to-have—it is a prerequisite for the security reviews that precede any production deployment in a regulated industry.
Budgeting integration as a first-class deliverable. A realistic enterprise agent deployment budget allocates roughly equal resources to agent capability development and integration infrastructure. Teams that front-load capability work and defer integration to the end of the project consistently run out of runway before they cross the production threshold.
The Architectural Prescription
The cleanest solution to the legacy system gap is a dedicated integration layer that translates between modern tool-calling interfaces and the systems of record beneath them. This is not a new idea—enterprise service buses and middleware platforms have existed for decades—but the agent era requires this layer to be semantically aware, not just structurally translating.
MCP (Model Context Protocol) represents a meaningful step in this direction. By defining a standard interface through which agents can discover and invoke capabilities, it creates a translation surface that integration teams can own independently of the agent development teams. Neumar's support for 10,000+ MCP skills reflects a bet that the integration layer, properly standardized, can be built once and reused across many agent applications—reducing the per-project cost of crossing the legacy gap.
The 40% failure rate Gartner forecasts is not inevitable. But avoiding it requires treating integration as the hard problem it is, rather than the logistics problem it appears to be.
The organizations that build durable agent infrastructure in 2026 will be the ones that started with the uncomfortable question: what do our systems of record actually look like from the outside, and what would it take to make them agent-accessible? The answer is rarely flattering. It is almost always worth knowing.
