Understanding and configuring Windows Server Failover Cluster Quorum for SQL Server Always On


What is quorum?

SQL Server availability solutions such as Always On Availability Group (AG) and Failover Cluster Instance (FCI) depends on Windows Server Failover Cluster (WSFC) for high availability and failover operations. WSFC failover works based on the quorum. Generally, quorum is minimum requirement of something to be up and running. In Windows Cluster, quorum decides, minimum number of nodes required for cluster to be up and running.

In Windows Server cluster, more than half of the nodes required for cluster to be up and run. For example, if you have 4 node cluster, you need (4 / 2) + 1 nodes for cluster to work.  This is called quorum in WSFC.
Quorum and Roles:

Windows Server designed in a way, quorum understands the node which is active and nodes which are in stand-by. Also, at a given point of time, there cannot be more than active nodes in a Windows Server Failover Cluster. The quorum aware of this active node and in case of fail-over, it will decide the next active node and stand-by nodes. In AG and FCI, active means, nodes which owns the Resource Group at a given point of time.

Quorum and Partitioned Clusters:

In Windows Server Failover Cluster, each node communicates to other nodes in the cluster through a dedicated network connection. When there is communication failure between nodes, there will be partition of nodes and each node will think, other nodes are down and trying to host Resource Group (active) for keeping the system up and running. In Windows Server Failover Cluster (WSFC), more than one node cannot be active at a given point of time. This conflict is called as split-brain scenario.

Quorum designed in a way, it is aware of this communication, and if there are partitions in cluster, due to network failure or some other issues, it will intervene to prevent the split-brain scenarios. The partition with majority quorum will own the resource group and cluster will force stop the cluster service from nodes of other partitions and its removed from the WSFC.

WSFC designed as when the communication between removed nodes established either manually or automatically, nodes can communicate to current cluster nodes, they will automatically join the cluster and start their cluster service.

How Windows Server Failover Cluster Quorum works:

Let’s assume, the cluster is partitioned due to network failure, each partitioned nodes will try to own the resource group. In order to achieve this, each subset of nodes has to prove the quorum majority.

For example, you have 5 node cluster, post partition each subset will try to prove the quorum majority and subset with 3 nodes will own the resources. Here, cluster quorum used to avoid the split-brain situation.

This works well, when you have odd number of nodes, what happens when you have even number of nodes? For example, you have a cluster with 4 nodes, and if network failure, you have partitioned cluster of 2 subsets with 2 nodes each. It is 50/50 quorum, and again both subset will consider itself as majority and try to own the resources, and fail to prove the majority hence cluster will be down.

How the cluster manages quorum, when you have even number of nodes? There are 2 options:

Add a cluster witness and increase the vote count by one or decrease the vote of node as zero and make total number of votes as odd number.

Starting from Windows Server 2012 quorum is configured automatically using Dynamic Quorum. However cluster witness must be added manually to cluster.

Dynamic Quorum:

Post verifying the quorum majority, the new majority definition will be updated among new cluster nodes. The concept of the total number of votes adopted after a successive failure is known as Dynamic Quorum. This allows cluster to lose one node, then another, then another until the last standing member and majority definition is updated on each loss of nodes dynamically.

For example, you have a 3 node cluster and if there is a failure, the subset of 2 nodes will survive and other node will be removed from cluster. The quorum needed for cluster to be up and running is more than half, in our case, we have left with only 2 nodes, the quorum is 50/50. Now cluster will automatically zero down vote of either one of the nodes and other node will be assigned as majority node with total vote of 1 out 1. This is called dynamic quorum.

Dynamic Witness:

Starting from Windows Server 2012 R2, the vote of cluster witness is calculated dynamically, when you have odd number nodes, the witness does not have vote and, if there are even number of nodes, the witness will have votes make total number of vote as odd. The process of dynamically deciding the witness vote is called Dynamic Witness.

For example, you have a 5 node cluster with a cluster witness. By default, each node and witness will have 1 vote each. The total vote for quorum of this cluster is 5, in this case, the value of witness vote is 0, since it is already an odd number. Now assume, there is a failure and one node removed from cluster, the total vote value will be 4, now the cluster will automatically assign the voting of witness by 1 to make the total value of vote as odd number. This process is called, dynamic witness.

Quorum Witness Types:

The Windows Server Failover Cluster (WSFC) supports the following 3 types of quorum witnesses:

Disk Witness:

A small size of storage disk (typically 1 GB) attached with the cluster. The quorum disk is accessible from all nodes in the cluster and is highly available. Disk witness contains the copy of the cluster database.

File Share Witness:

A shared folder from an external server, which is accessible from all the nodes in the cluster. Usually from Windows File Share or a folder from Domain Controller. It should be reliable and DBA should aware of any changes to the file share, either access related or maintenance related. File Share witness maintains the clustering information in a witness.log file, but does not store copy of the cluster database.
  
Cloud Witness:

Introduced on Windows Server 2016, BLOB storage account in Azure, which is accessible to all the nodes in the cluster. Cloud witness maintains the clustering information in a witness.log file, but does not store copy of the cluster database.

Quorum Models:

Choosing the correct quorum model is very critical decision for your availability solution to work. As we discussed earlier, Windows Server provides multiple combination of quorum models. Let’s look at what works best for your availability solution.
  • Node Majority
  • Node & Disk Majority
  • Node & File Share Majority
  • Node & Cloud Storage Majority
  • Disk-only - No Majority.
Let’s look at each models in detail.

Node Majority:

Quorum majority is calculated based on the active number of nodes in the cluster. By default, each node will have 1 vote each. There is not witness configured here. Recommended quorum mode for Availability Groups (AGs) and Failover Cluster Instances (FCIs) when there is an odd number of nodes.

Node & Disk Majority:

Quorum majority vote is calculated as number of active nodes in the cluster and shared disk cluster resource. Connectivity by any node to disk resource count as an affirmative vote for the disk. Size of the disk should be 512MB minimum and it should be excluded from Anti-Virus scanning.  The disk resource should be able to failover as a stand-alone instance. This model is recommended for Failover Cluster Instance but not recommended for Availability Group (AGs).

Node & File Share Majority:

In this model, voting is based on the active number of nodes in the cluster and a file share resource. By default, each node will have a vote and file share will have 1 vote. As a best practice, the file share resource should not be physically located on any node of the cluster, in-case of loss of that node will result in loss of 2 votes. This model is recommended for Availability Groups and Failover Cluster Instances, when there are even number of nodes in the cluster.

Node & Cloud Storage Majority:

In this model, voting is calculated based on the active number of nodes in the cluster and Azure Blob Storage. By default, each node will have 1 vote and Azure Blob Storage will have a vote. The Cloud witness is available from Windows Server 2016 and previous versions can continue to use other models. Cloud Storage witness act as data centre and will provide reliable voting. This is recommended for Always On Availability Group and Failover Cluster Instance, when there is an even number of nodes.
  
Disk-Only - No majority:

In this model, there is no quorum majority calculated, active node is determined by the shared disk cluster resource. There are no votes for the nodes in the cluster, only single-vote of the disk-resource is required to be online as primary. All nodes in the cluster must have connectivity to disk to gain an affirmative vote and be online. This model is not recommended for Always On Availability Group and Failover Cluster Instances. This model exists only for backward compatibility.

Example Scenarios:

Scenario 1: Two nodes without witness:

Cluster will dynamically decide the quorum, vote of either one of two nodes will be zeroed down and another node will be assigned as total votes 1 out 1 and it has quorum majority and own the resources.

If the current active node fails unexpectedly, then the cluster will go down, it is a single point of failure, but if is a graceful shutdown, the voting will be shifted to the another node and the cluster will be still up. There is a fifty percent chance, the cluster will survive one failure. This is the reason we need a cluster witness.

Scenario 2: Two nodes with witness:

In this case, the quorum has total 3 votes. If any of a node or witness goes down, you will still have a node and witness or 2 nodes with quorum majority and cluster will be up and running. This cluster will survive maximum one failure.

Scenario 3: Three nodes without witness:

Total vote is 3, if any one of the nodes is, the vote will be 2/3 and the cluster can survive, at this point, dynamic quorum updated and cluster will be scenario 1. This cluster can survive one failure and there is a fifty percent chance of next failure.

Scenario 4: Three nodes with witness:

In this case, the witness does not have vote (dynamic witness), so the total vote is an odd number 3. If any one of the nodes fails, the cluster will be scenario 2. This cluster can survive 2 subsequent failures of node.

Scenario 5: Four nodes without witness:

In this scenario, the cluster automatically zero downs one vote from a node and make quorum majority as odd number. If a node fails, the cluster becomes scenario 3. This cluster can survive 2 subsequent failures of nodes and there is a fifty percent chance of next subsequent failure of nodes.

Scenario 6: Four nodes with witness:

All four nodes and witness will have a vote and make the quorum majority an odd number. In case of failure, the cluster will become scenario 4. The cluster will survive 3 node failures.

Configuring the failover cluster quorum:

There are two ways you can configure the cluster quorum, Failover Cluster Manager and Failover Cluster Windows PowerShell cmdlets. In this article, we are going to configure node and file share majority for a cluster. The Failover Cluster Manager wizard is self-explanatory and having understanding of the quorum concept will help you in configuring other quorum models and it is very similar.


Go to Failover Cluster Manager and connect to the cluster. Right click on the cluster name and go to More Actions and choose the Configure Cluster Quorum Settings.

Failover Cluster Manager - Quorum Configuration
Failover Cluster Manager - Quorum Configuration

How to configure Read-Only routing on SQL Server 2016 Always On Availability Group (AG)?

The great advantage of Always On Availability Group over other availability solutions is their ability to scale-out read operations (SELECT queries). Read-only routing is a feature of Always On Availability Group, which redirects connection requests from applications into readable secondary. In SQL Server 2016 Always On Availability Group, you can configure upto 8 readable secondary replicas.

The connections are redirected based on the routing rules. Always On Availability Group provides the following options to define the rules:
  • Read-Only Routing URL
  • Read-Only Routing List
Before defining the routing rules, we must understand the following conditions: