What happens if a node goes down in a balanced galera cluster

cymito

In a 3 node galera cluster with a load balancer in front ensuring 1 writer node and remaining as backup read only, what happens if the main node goes down?

Marco_Weiss

In a 3-node Galera Cluster with a load balancer in front of it, configured to ensure one writer node and the remaining nodes as backup read-only nodes, the behavior when the main writer node goes down depends on the state of the cluster and the failover process.

Normal Operation (All Nodes Up):
- The load balancer directs all write requests to the designated writer node.
- Read requests can be load-balanced among all three nodes, but the load balancer might prefer the designated writer node to offload the writing load from the other nodes.
Writer Node Failure:
- If the designated writer node goes down, the remaining two nodes continue to operate as a cluster.
- One of the remaining nodes will be automatically selected as the new writer node through a process known as "quorum-based component state transfer."
- The cluster continues to accept writes and reads, though there might be a short period of time during the failover when writes are temporarily unavailable.
Writer Node Recovery:
- When the original writer node recovers and rejoins the cluster, it will synchronize its state with the current writer node.
- Once synchronization is complete, it rejoins the cluster as a non-writer node (read-only).
Reader Node Failure:
- If one of the read-only nodes goes down, it doesn't significantly impact the cluster's ability to operate.
- The load balancer can simply redirect read requests to the other available nodes.

It's important to note that Galera Cluster is designed for high availability and automatic failover, but the exact behavior and timing of these processes can depend on various factors, including network latency, quorum settings, and the configuration of the cluster. It's recommended to thoroughly test failover scenarios in a controlled environment to ensure that your setup behaves as expected and meets your high availability requirements.