Virtana Unveils the First Full-Stack AI Factory Observability Platform

Back to Press Release

Unique new capabilities help enterprises tame AI infrastructure complexity, boost resource efficiency, and bring predictability to industrial-scale AI operations.

PALO ALTO, CA—May 20, 2025—Virtana, the leader in hybrid infrastructure observability, today announced the launch of Virtana AI Factory Observability (AIFO), a powerful new capability that extends Virtana’s full-stack observability platform to the unique demands of AI infrastructure. With deep, real-time insights into everything from GPU utilization and training bottlenecks to power consumption and cost drivers, AIFO enables enterprises to turn complex, compute-intensive AI environments into scalable, efficient, and accountable operations. This launch strengthens Virtana’s position as the industry’s broadest and deepest observability platform, spanning AI, infrastructure, and applications across hybrid and multi-cloud environments.

“AI has the potential to be as transformative as the steam engine or the printing press—but only if enterprises can operationalize it at scale,” said Paul Appleby, CEO of Virtana. “Right now, too many teams are flying blind when it comes to AI infrastructure. Virtana AIFO gives them the visibility and control they need to treat AI not as an experiment, but as a core, strategic part of the business.”

Following a surge in enterprise investment and industry focus on scalable AI Factory infrastructure from ecosystem leaders like NVIDIA, Virtana is the first to deliver a full-stack observability solution purpose-built for AI Factory operations. As organizations move from AI pilots to production, demand is growing rapidly for platforms that go beyond surface-level monitoring to deliver deep, correlated insights across infrastructure, models, and cost drivers.

Industry analysts have identified this shift as a key trend. AI is no longer a research initiative; it is becoming an operational foundation for business. Virtana’s AI Factory Observability (AIFO) directly addresses this evolution, helping enterprises treat AI infrastructure with the same level of visibility, discipline, and accountability as traditional IT.

As an official NVIDIA partner, Virtana integrates natively with NVIDIA GPU platforms to deliver in-depth telemetry, including memory utilization, thermal behavior, and power metrics, providing precise, vendor-validated insight into the most performance-critical components of the AI Factory. This deep integration delivers accurate, actionable intelligence at enterprise scale.

“AI workloads introduce an entirely different set of infrastructure challenges—from GPU saturation and training bottlenecks to unpredictable cost spikes,” said Amitkumar Rathi, Senior Vice President of Engineering, Product, and Support at Virtana. “We designed AIFO to address these realities head-on. It gives teams deep, correlated visibility across the full AI stack, enabling them to optimize performance, reduce waste, and scale AI with confidence.”

With this launch, Virtana directly addresses the growing infrastructure challenges that stand in the way of scalable AI success. As enterprises accelerate investments in AI, many are encountering hidden inefficiencies: idle GPUs that inflate costs, training jobs that fail without explanation, and inference pipelines that stall due to underlying storage or network issues. AIFO is purpose-built to solve these problems, delivering real-time visibility and correlated insights across every layer of the AI infrastructure stack. The result is greater control over performance, spend, and scale—turning AI from a high-risk initiative into a high-impact capability.

Purpose-Built Observability for AI Infrastructure

Unlike traditional monitoring tools built for general IT workloads, Virtana AI Factory Observability (AIFO) is purpose-built to meet the demands of AI operations. It continuously collects telemetry across GPUs, CPUs, memory, network, and storage and then correlates that data with training and inference pipelines to provide clear and actionable insights.

Core capabilities include:

GPU Performance Monitoring – Tracks per-GPU metrics such as memory, utilization, thermal load, and power draw across multiple vendors.
Distributed Training Visibility – Identifies bottlenecks, synchronization issues, and stragglers across multi-node jobs.
Infrastructure-to-AI Mapping – Correlates model-level performance directly to hardware-level behavior, including network and storage dependencies.
Power and Cost Analytics – Exposes inefficiencies such as thermal throttling, idle GPU time, and overprovisioning resources.
Root Cause Analysis – Diagnoses training failures and inference slowdowns faster by pinpointing the most likely infrastructure causes.

All capabilities are accessible via Virtana’s Global View dashboard, which unifies telemetry across hybrid and containerized AI environments—on-premises, cloud, or both.

Proven Results from Enterprise Deployments

AIFO is already delivering measurable results in production AI environments across multiple industries. Operational outcomes include:

40% reduction in idle GPU time, improving resource utilization and reducing infrastructure costs.
60% faster mean time to resolution (MTTR) for AI-related incidents
50% decrease in false alerts, reducing operational noise and accelerating response
15% improvement in power efficiency, supporting sustainability goals.

Available Now, Built for What’s Next

Virtana AI Factory Observability (AIFO) is now generally available as a fully integrated capability within the Virtana Platform. Purpose-built for the demands of modern AI infrastructure, AIFO scales effortlessly from early-stage test environments to enterprise-grade AI factories. This launch, together with Virtana’s recent acquisition of Zenoss, further extends the company’s leadership in delivering the deepest, and broadest observability platform across applications, infrastructure, and AI workloads in hybrid and multi-cloud environments.

Additionally, Virtana’s recent acquisition of Zenoss expands the platform’s event intelligence and service-centric observability capabilities, allowing customers to correlate AI model performance with broader application behavior and infrastructure health. Together, these advancements deepen Virtana’s ability to help enterprises manage the full complexity of AI operations in the most demanding environments.

This launch coincides with Virtana’s presence at Dell Technologies World 2025, where the company is showcasing AIFO in booth #262 and offering live demonstrations of its observability capabilities for GPU-intensive environments.

To read the blog post, visit here.

To learn more or request a personalized demo, visit virtana.com.

About Virtana

Virtana is the leader in observability for hybrid infrastructure. The AI-powered Virtana Platform delivers a unified view across applications, services, and underlying infrastructure, correlating user impact, service dependencies, performance bottlenecks, and cost drivers in real time. Trusted by Global 2000 enterprises, Virtana helps IT, operations, and platform teams improve efficiency, reduce risk, and make faster, AI-driven decisions across complex, dynamic environments. Learn more at virtana.com.

Virtana Insight

Virtana News

May 14 2025Virtana Insight

Virtana Acquires Zenoss to Deliver the Industry’s Deepest and Broadest Observability Platform

AI-powered IT Operations from service impact to infrastructure root cause—enhancing resolut...

Virtana News

December 03 2024Virtana Insight

Hitachi Vantara and Virtana Collaborate to Enhance Hybrid Cloud Infrastructure with AI-Powered Automation

Technology partnership provides AI monitoring and observability for new Hitachi EverFlex in...

Virtana News

November 18 2024Virtana Insight

Virtana Unveils Virtana Platform, Delivering Deepest Hybrid Infrastructure Observability

AI-Powered Platform Unifies Visibility Across On-Premises, Cloud, and Kubernetes Environmen...