AI is transforming IT operations, but it amplifies what it understands—and nothing more. For enterprises running mission-critical hybrid applications, giving AI incomplete context isn’t just unhelpful. It’s dangerous. Here’s why the observability reckoning is here, and what your team needs to do about it.
There’s a familiar warning that comes up every time someone talks about working with AI tools: they can sound wrong confidently. Give a large language model the wrong context, and it will hallucinate an answer with total conviction. Most people accept that as a known quirk of consumer AI.
What fewer people are talking about is how that same dynamic plays out inside your production infrastructure—at the scale of millions of dollars of revenue per hour.
AI is increasingly embedded in how modern enterprises detect, diagnose, and remediate infrastructure issues. That’s the good news. The risk is that AI, when deployed without sufficient context, doesn’t just fail quietly. It acts—confidently and incorrectly. And in environments where an e-commerce outage can cost $100,000 per hour, or a financial services failure runs $10,000 a minute, the stakes of getting it wrong are not theoretical.
This is the observability AI reckoning. Let’s unpack what it means, why traditional approaches fall short, and what a modern solution looks like.
AI Amplifies Whatever It Understands—For Better or Worse
Modern applications generate a staggering volume of telemetry. Metrics, events, traces, logs, configurations—the data streams are enormous and growing. As organizations have moved to microservices architectures, the number of components contributing to application performance has multiplied. AI is genuinely useful here: it’s good at analyzing data in context, identifying patterns across large datasets, and surfacing insights no human team could find manually.
But the operative phrase is “in context.” AI doesn’t just need data. It needs the right data—the full picture. When you’re troubleshooting an e-commerce checkout slowdown, you need more than frontend traces. You need to know whether your underlying infrastructure is constrained, whether a third-party payment provider like PayPal is performing correctly, whether a recent deployment introduced a regression. Without that complete view, AI produces confident answers based on incomplete information. That’s worse than no answer at all, because it takes your team down the wrong path while revenue bleeds.
When AI is applied to production systems, the margin for error compresses to near zero. Minutes matter. And a wrong AI-generated remediation applied to a production application isn’t a failed experiment—it’s an outage.
The Application No Longer Has a Fixed Edge
There’s a mental model many teams still carry from a decade ago: the application is the code. Monitor the code, and you’ve monitored the application. That model hasn’t been accurate for years, and it’s now actively harmful.
Consider a real pattern we see consistently in the market. A mid-to-large enterprise runs an application with containerized workloads in Google Cloud, virtual machines in Azure, and sensitive customer data that never leaves the on-premises data center due to compliance requirements. The application sprawls across all three. Its performance depends on the health of all three.
Add an AI chatbot to that checkout experience—something that answers product questions or handles returns—and you’ve added another layer: GPU infrastructure, token usage, inference latency, AI guardrails. Now you’re not just monitoring microservices across clouds. You’re monitoring an AI workload that itself depends on the same hybrid infrastructure your other services do.
The industries where this matters most aren’t just e-commerce. We work with customers whose applications power E911 services, hospital emergency alert systems, and even agricultural equipment where on-board AI makes split-second decisions in the field and syncs results to the cloud. In every case, the application doesn’t live in one place. It lives everywhere—and so does the risk.
Why You Can’t Separate Application Performance from Infrastructure Anymore
Here’s a scenario that plays out constantly in enterprise environments: a checkout service that normally responds in under a second starts taking a full minute. Your APM tool shows the slowdown at the application layer. But why is it slow?
It could be a code change from yesterday’s deployment. It could be resource contention on a shared host. It could be a third-party dependency—a payment processor, an identity service—having a bad day. I’ve lived this firsthand: early in my career, I spent two hours troubleshooting what I was convinced was a configuration change I’d made to our CRM, only to discover the vendor itself was having a rare outage. We didn’t check the status page until the two-hour mark.
Traditional APM tools show you where time is being spent in your application. They don’t explain why systems slow down. That distinction is the whole ballgame. Virtana’s approach is to trace all the way through the stack—not just to the VM, but through the VM to the ESXi host beneath it, where resource contention might be the actual culprit. Going three layers deep is often the difference between a 5-minute resolution and a 4-hour firefight.
This matters even more for seasonal workloads. E-commerce teams preparing for Black Friday run code freezes a quarter out and spend weeks on load testing—because they know that when 60% of their annual revenue hits in a single day, they cannot afford to find out the hard way that their infrastructure wasn’t sized correctly.
The Tool Sprawl Problem (and Why Consolidation Isn’t Just Tidiness)
According to Gartner, tool consolidation is one of the top priorities for enterprise IT leaders—and for good reason. Most large organizations are running 10 or more monitoring tools. How does that happen? It’s rarely a deliberate choice. It’s accumulation: a hardware vendor bundles its own monitoring with a purchase, so you start using it. AWS provides solid out-of-the-box monitoring for AWS resources, so you layer that in. Two acquisitions later, you have two different APM tools and no clean way to reconcile what they’re each telling you.
The consequence isn’t just operational inconvenience. One of our architects described troubleshooting a production issue with an AI agent recently—infrastructure hosted on AWS, inferencing through OpenAI, third-party connectors in between. He had seven browser tabs open, one for each tool’s view of its own piece of the application. Seven tabs to understand one problem. That’s not a workflow. That’s triage triage.
The goal isn’t to rip everything out and start over. Some tools serve specific purposes that aren’t going away—log management for SOC 2 compliance, for example, or a specialized RUM tool your team has built workflows around. The goal is a unified view that can bring those data sources together, so that when you’re troubleshooting a performance issue, you’re not switching tabs. You’re looking at one screen with all the context you need.
Agentic AI Is Different—and That Requires a Different Foundation
Not all AI in observability is the same. There’s traditional ML that learns the normal behavior of your infrastructure over time—understanding where CPU utilization should sit day over day, week over week, flagging anomalies when it deviates. That capability is still valuable, and it’s not going away.
Agentic AI is something different. It can take a dense, technical root cause analysis and produce a plain-English explanation your team can act on—and share with leadership without translation. It can query documentation, correlate a performance degradation with recent deployments, open a Jira ticket with a proposed fix, and route it through your approval workflow before anything touches production. What required YAML expertise and hours of manual effort five years ago can now be described in natural language and executed with appropriate human oversight.
But agentic AI is only as good as the context it has. This is why Virtana’s emphasis on breadth and depth—the most comprehensive view of your hybrid infrastructure—isn’t marketing language. It’s the technical prerequisite for AI that actually works. An agent reasoning across traces, topology, infrastructure metrics, and logs from your full hybrid stack can drive remediations that are accurate and fast. An agent working with a partial view will be confident and wrong.
The Human Stays in the Loop—By Design
When we were building out Virtana’s automated remediation capabilities, we went out and talked to 20 different operators about how they wanted AI-driven remediation to work. The answer was unanimous: every single one said they needed a human in the loop to approve changes before they touched production.
This isn’t resistance to automation. It’s appropriate accountability. When an application is responsible for millions of dollars of revenue, the people accountable for it need to know what’s changing and why. AI can detect the problem, analyze the cause, propose the fix, and route the ticket. But a human should be the one to press the button.
That’s a feature, not a limitation. The path to self-healing infrastructure runs through human trust in the system. You build that trust incrementally—by demonstrating accuracy, by making the AI’s reasoning transparent, by keeping humans informed at each step. The goal isn’t to remove people from the process. It’s to make their decisions faster and better-informed.
Conclusion: AI Won’t Take Your Job. But Someone Using AI Will.
There’s a version of the AI conversation that’s mostly fear—job displacement, loss of control, technology running ahead of human judgment. I don’t think that’s the right frame. AI is a force multiplier. Teams that embrace it will be able to operate faster, resolve incidents in minutes instead of hours, and do more with the same resources. Teams that don’t will find themselves competing against those who do.
The practical advice is straightforward: identify the most manual, repetitive parts of your operations workflow and start there. What takes the most time when an alert fires? What requires the most expertise to interpret? What gets escalated unnecessarily because nobody wants to guess wrong? Those are the use cases where AI can start delivering value immediately.
But to get there, the foundation has to be right. You need observability that gives AI the full picture—across your on-premises infrastructure, your cloud workloads, your containers, and your AI workloads themselves. Context is everything. Without it, AI is just confident noise.
With it, it’s the closest thing to a self-healing infrastructure we’ve ever had.
See how Virtana’s unified hybrid observability platform gives AI the context it needs to actually work. Request a demo at virtana.com.
David McNerney
David McNerney is Director of Product Management at Virtana, leading Application Observability, Container Observability, and Service Observability. He focuses on building the cloud and hybrid monitoring capabilities that enable Global 2000 enterprises to resolve incidents faster and optimize infrastructure costs.