Combining AlwaysOn AG with Failover Cluster Instances

In this article we'll look at how AlwaysOn availability groups combine with Failover Cluster Instances. We'll look individually at Failover Cluster Instance composition, Windows Server Failover Clusters and AlwaysOn availability groups, what they are and how they interact. For the purposes of this article the following are used;
FCI     A failover cluster instance of SQL Server
AO     Always On availability groups, the new SQL Server 2012 feature
WSFC     Windows Server Failover Cluster, a feature of the Windows 2008 operating system
NIC     Network interface card
SAN     Storage area network
LUN     Logical unit number

 For many years now Windows Server Failover Clusters and Failover Cluster Instances have been a popular configuration for highly available SQL Servers. Later, with SQL Server 2005 came database mirroring. Although mirroring was limited, it offered an extra level of redundancy. Now with SQL Server 2012, we take a huge leap forward in highly available SQL Servers.

AlwaysOn is the new high availability feature in SQL Server 2012 and provides the ability to create multiple copies of a highly available database. In order for AlwaysOn to provide this level of availability it utilises the functionality within Windows Server Failover Clusters. It doesn't by any means replace Failover Cluster Instances of SQL Server, rather, it shares the use of the core technology presented by the Windows operating system. Just as you could with database mirroring, you have the option to combine an AlwaysOn availability group with a SQL Server Failover Cluster Instance, but it adds a great deal of complexity due to the strict AO implementation rules and the sharing of the Windows operating system failover cluster feature.

In this article we're going to look at 2 different scenarios, the first provides a basic AlwaysOn configuration, the second will show how an FCI can combine with AlwaysOn groups to provide an extremely resilient high availability option for your SQL Server system.

Before we start, and to help us on our way, let's look briefly and individually at each of these technologies to understand a little more about them.
What is a Windows Server Failover Cluster?

Windows Server Failover Clusters are a feature of the Windows server 2008 operating system. They provide the ability to combine multiple computer nodes (physical and\or virtual) to service a set of applications for high availability, in this case a SQL Server instance. The application is made highly available by presenting to clients a virtual access point comprising a unique IP address and a unique computer name or "virtual network name". These become resources in an application group and are passed between participating nodes like tokens.

The computer nodes use shared storage, which is presented to each node, usually from a SAN. The disks are added to clustered application\group and will failover between the nodes when the group is moved (either manually or through automatic failover). Throughout normal operation, only one node may obtain access to the storage disks at any one time (the node that owns the particular resources) to avoid volume corruption, this is also true for all of the remaining resources in the cluster group too. Disk access is controlled by the cluster service once the disks have been added as cluster resources.

The WSFC requires some form of mediation to control cluster resource ownership, and uses Quorum to maintain cluster stability. In Windows Server 2008 WSFC, this Quorum takes the form of a node voting system with the majority required to maintain Quorum. You may also use additional quorum resources in the form of a disk for localised clusters or a remote file share for multi site clusters.

A critical hardware failure of the active computer node would result in the loss of group service and would automatically start the group on an available partner node. At a high level, the client access point details are transferred along with any disk and service resources, etc to a failover partner node. A failover of the clustered instance causes disconnection of client connections, these may then reconnect once the service is available on a partner node.

Common failures usually are;

    Public NIC or network failure
    Power supply failure
    Motherboard failure
    CPU failure

What is a Failover Cluster Instance?

A Failover Cluster Instance is an instance of SQL Server, default or named, that has been installed onto a WSFC as a clustered application. The clustered application typically has the following resources as a minimum;

    IP address
    Network name
    Shared disk(s)
    SQL Server service
    SQL Server agent service

A clustered instance of SQL server will utilise any shared storage that has been presented to the Windows Server Failover Cluster nodes, usually this storage will take the form of LUNs presented from a SAN. A Failover Cluster Instance of SQL Server is deployed by launching the "New SQL Server Failover Cluster Installation" wizard on the first computer node that will participate in the FCI. Once this has been performed you would then launch the "Add Node to a SQL Server Failover Cluster" wizard on any computer node in the WSFC that you wish to participate in the new SQL Server FCI. For Standard edition of SQL Server you are limited to 2 nodes, Enterprise and upwards support the OS maximum (8 nodes in Windows 2003 and 16 nodes in Windows 2008).

Note: Standard edition limits the FCI to 2 nodes but this does not dictate how many nodes have membership of the Windows cluster, it is merely at the installer level.
What is an AlwaysOn Availability Group?

An AlwaysOn Availability Group, is a group of one or more databases configured on a primary replica (or SQL Server instance) for high availability. This will typically include one or more further replicas which will service a copy of the highly available database(s). Partner databases may be either readable or standby and may also use either asynchronous or synchronous commit modes.

In AlwaysOn a replica refers to an instance of SQL Server that is participating in the AlwaysOn availability group.

AlwaysOn relies on the WSFC core functionality to achieve the high availability that AO offers, but does not require any of the following shared resources associated with an FCI.

    Shared disks
    Shared IP address
    Shared network name
    Shared SQL Server and SQL Server agent resources

There is one exception to this rule, when creating an AlwaysOn group listener this will create a set of resources shared by the AO group replicas, but note that this application\cluster group has no relation to a Failover Cluster Instance.

If you are using a listener for your AO group you are able to provide a central client access point for entry into the availability group, this provides a great deal of resilience when used in your application connection strings.

AlwaysOn is a feature configured at the individual service level by using SQL Server Configuration Manager. A computer node MUST first be a member of a valid WSFC before you may enable the AlwaysOn service feature. Availability replicas MUST be situated on separate physical nodes. Given this, in an example, INST1 must reside on Node1 and INST2 must reside on Node2. You cannot install INST1 and INST2 on to Node1 and configure an AO group.

For a basic AlwaysOn configuration, each node has local disk storage and non clustered instances of SQL Server installed. Ultimately, AO removes the storage single point of failure that is common place with shared disks. The nodes are completely stand alone apart from their membership of a valid Windows Server Failover Cluster.

Now that we have a little background we may start to look at how these technologies interact.
How Do Failover Cluster Instances Integrate With AlwaysOn Groups?

We first need to understand how this affects the WSFCs.

All participating nodes in an AlwaysOn group need to be joined to the same WSFC. Yes, that means that the servers must be joined to the same Windows domain to participate in the same WSFC. WSFCs, Failover Cluster Instances and AlwaysOn groups do not span Windows domains.



In a basic example of an AlwaysOn configuration we have the following node and IP address configurations:

Computer Nodes

AONode1.GM.co.uk                      192.168.1.41

AONode2.GM.co.uk                      192.168.1.42



WSFC Joined

SQLCluster01.GM.co.uk                192.168.1.45


Non Clustered SQL Server Instances

AOINST1 on AONode1                     
AOINST2 on AONode2

 AlwaysOn Group

SQLAOGrp1


AlwaysOn Listener

SQLAOListener.GM.co.uk              192.168.1.46



How does this look?

The following diagram shows a graphical representation of how your system would look

For this scenario, each cluster node listed above has the following local drive storage and a single non clustered instance of SQL Server installed.

The instances installed are;

    AONode1\AOINST1
    AONode2\AOINST2

The locally attached disks in use on each node are;

    C: = 78GB
    D: = 300GB
    E: = 300GB
    F: = 300GB

The key here is that both nodes have separate storage resources to each other but are both joined to the Windows cluster shown below;

SQLCluster01.GM.co.uk

The AlwaysOn group has a listener using the virtual network name and IP detailed above. This would represent a standard AlwaysOn configuration and does not use a SQL Server FCI or shared storage resources.

Remember, the goal in AlwaysOn is to remove the storage single point of failure by utilising separate storage resources between participating computer nodes.

Now let's take our second scenario to include a corporate FCI into our AlwaysOn configuration.

The architect team have directed that the primary SQL server instance must be hardware protected on the primary site and there should be provision for read only replicas for reporting and maintenance. The following configuration incorporates an FCI to provide hardware protection on the primary site. These are the node and IP address configurations in use;



Computer Nodes

AONode1.GM.co.uk                                  192.168.1.41

AONode2.GM.co.uk                                  192.168.1.42

AONode3.GM.co.uk                                  192.168.1.43

AONode4.GM.co.uk                                  192.168.1.44



WSFC Joined

SQLCluster01.GM.co.uk                           192.168.1.50



Cluster SQL Server Instance

SQL-CL-INST\SiteLive                                         192.168.1.51



Non Clustered SQL Server Instances

    AONode1\DRRepl
    AONode2\CLMaint



AlwaysOn Group

    SQLAOGrp1



AlwaysOn Listener

    SQLAOListener.GM.co.uk                    192.168.1.46



How does this look?

The diagram below shows a graphical representation of the AlwaysOn availability incorporating a SQL Server Failover Cluster Instance.



Each server has the following locally attached drive storage.

    C: = 78GB
    D: = 300GB
    E: = 300GB
    F: = 300GB


In addition to the above locally attached disks, AONode3 and SAONode4 have the following shared LUNs attached from a SAN.

    G: = 500GB
    H: = 250GB
    L: = 300GB



We have our WSFC titled "SQLCluster01.GM.co.uk", this is the cluster entry point for all 4 nodes and each server is joined to this cluster.

A Failover Cluster Instance of SQL Server has been installed first to AONode3 and then AONode4 has been added as a failover partner node, the instance is titled

"SQL-CL-INST\SiteLive" as detailed above. The FCI is using the following shared drives

    G:
    H:
    L:



2 non clustered instances of SQL Server have been deployed as follows:

    AONode1 has an instance named "DRRepl"
    AONode2 has an instance named "CLMaint"



We have created an availability group titled SQLAOGrp1 on the clustered SQL Server instance called SQL-CL-INST\SiteLive and we include the following instances as replicas

    AONode1\DRRepl
    AONode2\CLMaint



This completes our AlwaysOn availability group incorporating FCI's.

Note here that AONode3 and AONode4 are part of the same Windows Server Failover Cluster as AONode1 and AONode2, only AONode3 and AONode4 are set up as failover partners for the clustered instance SQL-CL-INST\SiteLive (AONode1 and AONode2 have not had the SQL Server "Add node" wizard executed upon them).

As you can see, incorporating an FCI into an availability group requires an increased number of cluster nodes to ensure that the availability replicas meet the standard requirement whereby, each replica (SQL Server instance) must reside on a separate physical node. Understanding the different technologies allows you to pull them together to create one incredibly resilient SQL Server system.

Creating yourself a virtual test system will give you a better grasp on failover clusters and availability groups and as always have fun in your work. Post back if you're still unsure and I'll help all I can.

Comments

Popular posts from this blog

SQL SERVER – Event ID 107- Report Server Windows Service (MSSQLSERVER) cannot connect to the report server database.

SQL Server Builds Information

Using DBCA silent install and disabling automatic memory management