Always on Groups Clusters and Quorum Configuration
Before we set up always on availability groups we need our windows
servers clustered and configured according to the design of the cluster
and number of nodes.
When a cluster is created you must assign it a cluster configuration mode. These are defined below and recommendation is based on factors such as number of nodes, geographical location and shared storage,
Our most basic setup for this will be two nodes with one node in either site. Therefore Node and File Share Majority is the recommended cluster mode.
The Quorum provides a majority number of votes to stay alive. In this model if we lose two out of the three votes provided by Node1,Node2 and the Fileshare then we lose the quorum and the cluster for both Nodes will be unavailable. When the quorum is lost, the database becomes unavailable and the cluster service stops. The servers themselves can still be available and the cluster can be forced up with one available node if required as a last resort but care must be taken if doing this.
Chart to show effect of loss of Node 1, Node 2, Fileshare
combinations, based on Node 1 hosting the primary availability group and
Node 2 hosting the standby availability group.
Automatic failover is off and asyncronous replication is being used, which means possible data loss if failing over database.
Dynamic Weighting
Pre-windows 2012 dynamic weighting did not exist. If you lose the majority of your nodes e.g 3 nodes in a 5 node cluster then the cluster would shut down. A minimum of two votes (2/3) is still required within a two node and fileshare witness cluster (a majority must be formed). Windows Server 2012 introduced dynamic weighting which means in a cluster of 4 nodes + file share witness, if we lose a node or witness then the number of votes dynamically changes down. For example if you lose 1 node and the file share witness the number of votes is then 3. At that point if another node is lost then usually we would lose the quorum (2 out of 5 votes not a majority). Because of dynamic weighting the total number of votes is reduced to 3 before the node was lost, and it can still run with a 2/3 vote majority.
We can also run a 2 node cluster without a file share witness and set the weighted node to the primary server. This will allow a clean shutdown of the standby or primary server and keep the cluster running on the other node. The disadvantage is that losing the primary through a network disconnect will lose quorum on the cluster meaning the cluster will shut down and need user intervention to force start which is not a clean method. This also rules out automatic failover, and if the service is running from the standby and it's not set to be the weighted cluster node, then losing the primary node will bring down the service running on the standby. This also applies if running 2 availability groups with a primary on each node.
Dynamic weighting can be changed in Powershell using (Get-Cluster).LowerQuorumPriorityNodeID.
Can I have multiple clusters per node or multiple availability groups per cluster?
You can only set up a node to be part of a single windows failover cluster.
You can set up multiple availability groups per cluster, so you can have a primary running on one node and a different primary running off the other node replicated back the other way. The only caveat of this is if network connectivity is lost between nodes (network partitioning) then the node with the lowest ID will be shut down, therefore recommended setup will be to have a primary node and a standby DR node, without attempting to run availability groups across both nodes as primary.
For basic availability groups multiple AG's can be set up but with a limit of one database per AG. Think of this as a standard edition replacement for mirroring.
Please see below for an explanation:
https://blogs.technet.microsoft.com/askcore/2011/08/08/partitioned-cluster-networks/
Recommended Adjustments to Quorum voting
Taken from here:
https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/wsfc-quorum-modes-and-voting-configuration-sql-server?view=sql-server-2017#RecommendedAdjustmentstoQuorumVoting
When enabling or disabling a given WSFC node’s vote, follow these guidelines:
When a cluster is created you must assign it a cluster configuration mode. These are defined below and recommendation is based on factors such as number of nodes, geographical location and shared storage,
Odd number of nodes | Node Majority |
Even number of nodes (but not a multi-site cluster) | Node and Disk Majority |
Even number of nodes, multi-site cluster | Node and File Share Majority |
Even number of nodes, no shared storage | Node and File Share Majority |
Our most basic setup for this will be two nodes with one node in either site. Therefore Node and File Share Majority is the recommended cluster mode.
The Quorum provides a majority number of votes to stay alive. In this model if we lose two out of the three votes provided by Node1,Node2 and the Fileshare then we lose the quorum and the cluster for both Nodes will be unavailable. When the quorum is lost, the database becomes unavailable and the cluster service stops. The servers themselves can still be available and the cluster can be forced up with one available node if required as a last resort but care must be taken if doing this.
Automatic failover is off and asyncronous replication is being used, which means possible data loss if failing over database.
Up | Down | Up | Up | Up | Syncronised | Recovery Pending | Bring back node 2 |
Up | Down | Down | Down | Down | Recovery Pending | Recovery Pending |
|
Down | Down | Up | Down | Down | Recovery Pending | Recovery Pending |
|
Down | Down | Down | Down | Down | Recovery Pending | Recovery Pending |
|
Up | Up | Down | Up | Up | Syncronised | Syncronizing | 1.Bring back file share 2. If 1 not possible then remove file share from configuration to provide extra resiliance to the database (can allow an extra node to fail) |
Up | Up | Up | Up | Up | Syncronised | Syncronizing | |
Down | Up | Down | Down | Down | Recovery Pending | Recovery Pending |
|
Down | Up | Up | Down | Up | Recovery Pending | Not Syncronized / Recovery Pending |
|
Dynamic Weighting
Pre-windows 2012 dynamic weighting did not exist. If you lose the majority of your nodes e.g 3 nodes in a 5 node cluster then the cluster would shut down. A minimum of two votes (2/3) is still required within a two node and fileshare witness cluster (a majority must be formed). Windows Server 2012 introduced dynamic weighting which means in a cluster of 4 nodes + file share witness, if we lose a node or witness then the number of votes dynamically changes down. For example if you lose 1 node and the file share witness the number of votes is then 3. At that point if another node is lost then usually we would lose the quorum (2 out of 5 votes not a majority). Because of dynamic weighting the total number of votes is reduced to 3 before the node was lost, and it can still run with a 2/3 vote majority.
We can also run a 2 node cluster without a file share witness and set the weighted node to the primary server. This will allow a clean shutdown of the standby or primary server and keep the cluster running on the other node. The disadvantage is that losing the primary through a network disconnect will lose quorum on the cluster meaning the cluster will shut down and need user intervention to force start which is not a clean method. This also rules out automatic failover, and if the service is running from the standby and it's not set to be the weighted cluster node, then losing the primary node will bring down the service running on the standby. This also applies if running 2 availability groups with a primary on each node.
Dynamic weighting can be changed in Powershell using (Get-Cluster).LowerQuorumPriorityNodeID.
Can I have multiple clusters per node or multiple availability groups per cluster?
You can only set up a node to be part of a single windows failover cluster.
You can set up multiple availability groups per cluster, so you can have a primary running on one node and a different primary running off the other node replicated back the other way. The only caveat of this is if network connectivity is lost between nodes (network partitioning) then the node with the lowest ID will be shut down, therefore recommended setup will be to have a primary node and a standby DR node, without attempting to run availability groups across both nodes as primary.
For basic availability groups multiple AG's can be set up but with a limit of one database per AG. Think of this as a standard edition replacement for mirroring.
Please see below for an explanation:
https://blogs.technet.microsoft.com/askcore/2011/08/08/partitioned-cluster-networks/
Recommended Adjustments to Quorum voting
Taken from here:
https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/wsfc-quorum-modes-and-voting-configuration-sql-server?view=sql-server-2017#RecommendedAdjustmentstoQuorumVoting
When enabling or disabling a given WSFC node’s vote, follow these guidelines:
- No vote by default. Assume that each node should not vote without explicit justification.
- Include all primary replicas. Each WSFC node that hosts an availability group primary replica or is the preferred owner of an FCI should have a vote.
- Include possible automatic failover owners. Each node that could host a primary replica, as the result of an automatic availability group failover or FCI failover, should have a vote. If there is only one availability group in the WSFC cluster and availability replicas are hosted only by standalone instances, this rule includes only the secondary replica that is the automatic failover target.
- Exclude secondary site nodes. In general, do not give votes to WSFC nodes that reside at a secondary disaster recovery site. You do not want nodes in the secondary site to contribute to a decision to take the cluster offline when there is nothing wrong with the primary site.
- Odd number of votes. If necessary, add a witness file share, a witness node, or a witness disk to the cluster and adjust the quorum mode to prevent possible ties in the quorum vote.
- Re-assess vote assignments post-failover. You do not want to fail over into a cluster configuration that does not support a healthy quorum.
- 1 vote to each node in the primary data center
- 0 votes to each node in the disaster recovery data center
Comments