What is a fail-over node

A computing node or host that is idle and only used when the primary node fails; part of a fail-over cluster.  High-availability clusters (also known as HA clusters, fail-over clusters or Metroclusters Active/Active) are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well. HA clusters are often used for critical databases, file sharing on a network, business applications, and customer services such as electronic commerce websites.

 

Types of failover nodes:

  • Graceful failover is the proactive ability to remove a data service node from the cluster in an orderly and controlled fashion. It is an online operation with zero downtime that is achieved by promoting replica virtual buckets on the remaining cluster nodes to active and the active virtual buckets on the affected node to dead. This type of failover is primarily used for planned maintenance of the cluster.
  • Hard failover is the ability to drop a node quickly from the cluster when it has become unavailable or unstable. Dropping a node is achieved by promoting replica virtual buckets on the remaining cluster nodes to active. Hard failover is primarily used when there is an unplanned outage of a node.
  • Automatic failover is the built-in ability to have the Cluster Manager detect and determine when a node is unavailable and then initiate a hard failover.

 

Related questions