Platform
- Unified Platform
  - Breadth & Depth by DesignObserve every layer, from application to infrastructure.
  - Natural Language InterfaceAsk questions and get evidence-backed answers.
  - Unified System ViewView dependencies, impact, and health in one place.
  - Correlation EngineIdentify the exact constraint driving issues.
  - Autonomous AI AgentsAgents that detect, reason, and act across the system.
  - AI Event IntelligenceTurn signals into prioritized, actionable insights.
  - MCP ServerCoordinate and operate AI workloads as a system.
  - Heterogeneous SupportExtend existing observability investments without rip-and-replace.
- AI Factory Observability
  - Full-Stack VisibilityApplication-to-infrastructure mapping
  - Token Economics & ForecastingMaximize and optimize token spend
  - GPU Performance & Cost OptimizationOptimize utilization to reduce infrastructure spend
  - Training & Inference PerformanceIncrease throughput and minimize training run failures
  - Power & Sustainability IntelligenceSupport sustainable AI operations  at scale
  - AI Security & GuardrailsEnsure save, compliant AI operations
  - Network & Data Flow ObservabilityDeep and broad broad  network observability
- Application Observability
  - Business Transactions & TracesTrace end-to-end user journeys across services to find bottlenecks.
  - Synthetic and Availability ObservabilityProactively test user paths to ensure uptime and SLA compliance.
  - Kubernetes ObservabilityMap service performance to container resources and runtime dependencies.
  - System-Aware ObservabilityOptimize app performance with deep visibility from code to storage.
  - Log Analysis and CorrelationConnect logs to traces to pinpoint root cause failure chains.
- Infrastructure Observability
  - Availability & RCA Prove what failed, where, and why it failed.
  - Autonomous OperationsInvestigate incidents with AI agents and governed automation.
  - Performance RemediationFind bottlenecks and limiting dependencies across the full stack.
  - Change RiskTest changes against live dependencies before rollout or migration.
  - Capacity ForecastingForecast constraints before they disrupt performance or delivery.
- Service Observability
  - Event IntelligenceCluster noise and prioritize incidents by service impact.
  - Service Dependency MappingMap dependencies and translate issues into service impact.
  - System-Aware Service OperationsAttach service context directly to incident workflows.
  - Remediation GovernanceTrigger safe automation based on service risk.
  - Anomaly DetectionCatch abnormal behavior before thresholds or outages do.
  - Full-Stack VisibilityTranslate technical issues into service and business impact.
- Want To Learn More?
  - Platform Overview
  - Explore Integrations
Solutions
- Why VirtanaWhen it comes to Mission Critical Workloads, there’s Virtana.
- Read More
Resources
Partners
- Partner Program
  - Partner ProgramPartner with Virtana to grow your business
  - Read More
- Technology Partners
  - Technology PartnersVirtana teams with industry-leading hardware and software companies to deliver best-in-class solutions
  - Read More
  - NetApp
  - Dell Technologies
  - AppDynamics
  - Nutanix
  - Servicenow
  - Cisco
  - AWS
  - Pure Storage
  - Infinidat
  - Hitachi
- Partner Portal
  - Partner PortalThe Virtana Partner Portal has information on Virtana’s solutions — including sales guides, data sheets, brochures, presentations, training videos, case studies, whitepapers, incentives and more.
  - Partner Login
Company
- About Us
- Leadership
- Newsroom
- Careers
- Support
- Contact Us
Login
Get a Demo

Fail-over Node

What is a fail-over node

A computing node or host that is idle and only used when the primary node fails; part of a fail-over cluster. High-availability clusters (also known as HA clusters, fail-over clusters or Metroclusters Active/Active) are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well. HA clusters are often used for critical databases, file sharing on a network, business applications, and customer services such as electronic commerce websites.

Types of failover nodes:

Graceful failover is the proactive ability to remove a data service node from the cluster in an orderly and controlled fashion. It is an online operation with zero downtime that is achieved by promoting replica virtual buckets on the remaining cluster nodes to active and the active virtual buckets on the affected node to dead. This type of failover is primarily used for planned maintenance of the cluster.
Hard failover is the ability to drop a node quickly from the cluster when it has become unavailable or unstable. Dropping a node is achieved by promoting replica virtual buckets on the remaining cluster nodes to active. Hard failover is primarily used when there is an unplanned outage of a node.
Automatic failover is the built-in ability to have the Cluster Manager detect and determine when a node is unavailable and then initiate a hard failover.

Related questions

What is a Cluster? Learn more

WordPress Cookie Notice by Real Cookie Banner