Clustered Database

What is a clustered database?

In Clustered Database, A cluster is a collection of commodity components to provide scalability and availability at a low cost. With this in mind, it is possible to create a database cluster for high-end enterprise applications by storing and processing information on commodity nodes. The architecture for a clustered database is distinguished by how data responsibilities are shared among compute nodes.
In a shared-nothing architecture, the partitioning is such that each compute nodeowns a subset of the data. That is, a node will operate exclusively on a particular subset. Scalability of course depends on wise partitioning. The physical partitioning may be that a compute node places its data portion on its local disk, and that all nodes are connected via a high-speed network (such as Myrinet or InfiniBand). Such a scheme—which resembles a storage area network—reduces fault tolerance as a node crash makes a portion of the database inaccessible. Therefore, the physical partitioning may actually share the disks so that one node can “take over” in the event that another node crashes.
This architecture is by far the most popular; IBM DB2, Microsoft SQL Server,MySQL Cluster, and Bizgres MPP (based on PostgreSQL) all follow this scheme. The supposed downside of this setup is the difficulty in installation and maintenance.
In a shared-everything (or shared-disk) architecture, any compute node can operate on any portion of the database. The physical layout usually involves network-attached storage, in which all nodes communicate with a separate, shared disk (often a RAID) via a high-speed interconnect (traditionally Fibre Channel). Again the logical partitioning is key for scalability. The obvious downside to a shared-everything architecture is that there may be enormous contention when numerous nodes attempt to communicate with the disk simultaneously.
As such, the only major database vendor that offers a shared-everything architecture is Oracle, whose RAC (Real Application Cluster) product builds a shared cache on each of the computing nodes through “Cache Fusion.” This technology uses the high-speed network to maintain cache coherency.
Both architectures may require a three different interconnects: Ethernet for management, InfiniBand or Myrinet for internode communication, and Fibre Channel for block storage communication. Given the trends in commoditization, it is possible that iWARP or Myri-10G will emerge as the single interconnect for a clustered database.

Q What is Clustered Database? If you allow the same database to be on 2 different servers how do they keep the data between synchronized. And how does this differ from load balancing from a database server perspective?

Clustered Database is a bit of an ambiguous term, some vendors consider a cluster having two or more servers share the same storage, some others call a cluster a set of replicated servers.
Replication defines the method by which a set of servers remain synchronized without having to share the storage being able to be geographically disperse, there are two main ways of going about it:
• master-master (or multi-master) replication: Any server can update the database. It is usually taken care of by a different module within the database (or a whole different software running on top of them in some cases).
Downside is that it is very hard to do well, and some systems lose ACID properties when in this mode of replication.
Upside is that it is flexible and you can support the failure of any server while still having the database updated.
• master-slave replication: There is only a single copy of authoritative data, which is the pushed to the slave servers.
Downside is that it is less fault tolerant, if the master dies, there are no further changes in the slaves.
Upside is that it is easier to do than multi-master and it usually preserve ACID properties.
Load balancing is a different concept, it consists distributing the queries sent to those servers so the load is as evenly distributed as possible. It is usually done at the application layer (or with a connection pool). The only direct relation between replication and load balancing is that you need some replication to be able to load balance, else you’d have a single server.

Answer2 From SQL Server point of view:
Clustering will give you an active – passive configuration. Meaning in a 2 node cluster, one of them will be the active (serving) and the other one will be passive (waiting to take over when the active node fails). It’s a high availability from hardware point of view.
You can have an active-active cluster, but it will require multiple instances of SQL Server running on each node. (i.e. Instance 1 on Node A failing over to Instance 2 on Node B, and instance 1 on Node B failing over to instance 2 on Node A).
Load balancing (at least from SQL Server point of view) does not exists (at least in the same sense of web server load balancing). You can’t balance load that way. However, you can split your application to run on some database on server 1 and also run on some database on server 2, etc. This is the primary mean of “load balancing” in SQL world.

Answer 3

Database Clustering is actually a mode of synchronous replication between two or possibly more nodes with an added functionality of fault tolerance added to your system, and that too in a shared nothing architecture. By shared nothing it means that the individual nodes actually don’t share any physical resources like disk or memory.
As far as keeping the data synchronized is concerned, there is a management server to which all the data nodes are connected along with the SQL node to achieve this(talking specifically about MySQL).
Now about the differences: load balancing is just one result that could be achieved through clustering, the others include high availability, scalability and fault tolerance.

Database Mirroring and SQL Server Failover Cluster Instances : –

A failover cluster is a combination of one or more physical disks in a Microsoft Cluster Service (MSCS) cluster group, known as a resource group, that are participating nodes of the cluster. The resource group is configured as a failover clustered instance that hosts an instance of SQL Server. A SQL Server failover clustered instance appears on the network as if it were a single computer, but has functionality that provides failover from one node to another if one node becomes unavailable. For more information, see AlwaysOn Failover Cluster Instances (SQL Server).

Failover clusters provide high-availability support for an entire Microsoft SQL Server instance, in contrast to database mirroring, which provides high-availability support for a single database. Database mirroring works between failover clusters and, also, between a failover cluster and a nonclustered host.
Mirroring and Clustering
Typically, when mirroring is used with clustering, the principal server and mirror server both reside on clusters, with the principal server running on the failover clustered instance of one cluster and the mirror server running on the failover clustered instance of a different cluster. You can establish a mirroring session in which one partner resides on the failover clustered instance of a cluster and the other partner resides on a separate, unclustered computer, however.
If a cluster failover makes a principal server temporarily unavailable, client connections are disconnected from the database. After the cluster failover completes, clients can reconnect to the principal server on the same cluster, or on a different cluster or an unclustered computer, depending on the operating mode. Therefore, when deciding how to configure database mirroring in a clustered environment, the operating mode you use for mirroring is significant.

High-Safety mode Session with Automatic Failover
If you intend to mirror a database in high-safety mode with automatic failover, a two-cluster configuration is recommended for the partners. This configuration provides maximum availability. The witness can reside either on a third cluster or on an unclustered computer.
If the node running the current principal server fails, automatic failover of the database begins within a few seconds, while the cluster is still failing over to another node. The database mirroring session fails over to the mirror server on the other cluster or unclustered computer, and the former mirror server becomes the principal server. The new principal server rolls forward its copy of the database as quickly as possible and brings it online as the principal database. After the cluster failover completes, which typically takes several minutes, the failover clustered instance that was formerly the principal server becomes the mirror server.
The following illustration shows an automatic failover between clusters in a mirroring session running in high-safety mode with a witness (which supports automatic failover).

Clustered Database

The three server instances in the mirroring session reside on three distinct clusters: Cluster_A, Cluster_B, andCluster_C. On each cluster, a default instance of SQL Server is running as a SQL Server failover clustered instance. When the mirroring session starts, the failover clustered instance on Cluster_A is the principal server, the failover clustered instance on Cluster_B is the mirror server, and the failover clustered instance onCluster_C is the witness in the mirroring session. Eventually, the active node on Cluster_A fails, which causes the principal server to become unavailable.
Before the cluster has time to fail over, the loss of the principal server is detected by the mirror server, with the help of the witness. The mirror server rolls forward its database and brings it online as the new principal database as quickly as possible. When Cluster_A finishes failing over, the former principal server is now the mirror server, and it synchronizes its database with the current principal database on Cluster_B.

High-Safety Mode Session Without Automatic Failover
If you are mirroring a database in high-safety mode without automatic failover, another node in the cluster will act as the principal server if the node running the current principal server fails. Note that while the cluster is unavailable, the database is unavailable.

High-Performance Mode Session
If you intend to mirror a database in high-performance mode, consider placing the principal server on the failover clustered instance of a cluster and placing the mirror server on an unclustered server in a remote location. If the cluster fails over to a different node, the failover clustered instance will continue as the principal server in the mirroring session. If the entire cluster has problems, you can force service onto the mirror server.

                         Interview Questions on Clustering

Here I am putting in some questions that might help you prepare yourself on Clustering. This is just to start and then you can follow it up with more in depth study and CBTs.

What is a Cluster?

A group of two or more servers together ensuring availability of a service or application even when one of its members goes down. Example of clusters are Microsoft Clustering Services (MSCS) and Microsoft NLB cluster.

Is cluster high available or fault tolerant solution?

When we talk about MSCS then we talk about a high available solution and when we talk about WLBS i.e. microsoft NLB we talk about fault tolerant solution.

How MSCS is different from NLB?

Most important thing to remember is, MSCS maintains the session state but NLB doesn’t. For example in case of two node cluster (MSCS with two nodes) if a SQL database is hosted on it and a transaction is going on, there will be no effect on the transaction even if one of the nodes goes down. While the fail over happens the transaction would definitely stop but resume as soon as the other node takes over. Where as in the case of NLB, if a node goes down,any session associated with it would end and the client has to reconnect and establish a new session. For example OWA, if the node to which my owa connections is established goes down, I need to reconnect as my session would time out after the node serving my session fails.

What is a quorum or What is a quorum resource or Quorum disk?

The quorum resource is a common resource in the cluster that is accessible by all of the cluster nodes. Normally a physical disk on the shared storage, the quorum resource maintains data integrity, cluster unity, and cluster operations—such as forming or joining a cluster—by performing the following tasks:

• Enables a single node to gain and defend its physical control of the quorum resource When the cluster is formed or when the cluster nodes fail to communicate, the quorum resource guarantees that only one set of active, communicating nodes is allowed to form a cluster.
• Maintains cluster unity The quorum resource allows cluster nodes that can communicate with the node containing the quorum resource to remain in the cluster. If a cluster node fails for any reason and the cluster node containing the quorum resource is unable to communicate with the remaining nodes in the cluster, MSCS automatically shuts down the node that does not control the quorum resource.
• Stores the most current version of the cluster configuration database and state data — If a cluster node fails, the configuration database helps the cluster recover a failed resource or recreate the cluster in its current configuration.
The only type of resource supported by MSCS that can act as a quorum resource is the physical disk resource. However, developers can create their own quorum disk types for any resources that meet the arbitration and storage requirements.

What’s the advantage/disadvantage of having 1 node cluster?

One node cluster is used for situations where in we just want the ability to get the stopped service restarted automatically. There are services which doesn’t have the capability to restart on its own, they are hosted on one node cluster as the cluster service would restart the failed service and we are good to go. However if the node itself fails, the service becomes unavailable.

How important is to have public and hearbeat network separate? Is cluster possible with just one NIC per node?

Its very important to have the public and private heart beat network separate as clubbing the two might induce the delay in hear beat packets reaching to the cluster service. This in turn would make the cluster fail over. Its not possible to have cluster with one single NIC but that is not a supported configuration. For more info please read support article.

What’s the port number for heartbeat communication between Cluster Nodes?

To ensure correct failover cluster functionality, add exceptions to firewall configuration settings for File and Printer Sharing (TCP 139/445 and UDP 137/1380.

Is it possible to have heartbeat and public NIC’s on the same subnet without causing any problems?

No its not possible to have heart beat and public NIC’s on the same subnet without causing any problems. Reason is, public NICs generate a lot of traffic and it might interfere with the traffic of heartbeat NICs in terms of inducing delaydue to congestion. This would cause the cluster to fail over and we don’t want this to happen.

Can 2 nodes belonging from multiple network subnet form a single Cluster?

While configuring a set of clustered nodes we need to have them on the same subnet.

What is the difference between multicast and unicast in a Cluster? Under which scenario unicast/multicast is more viable solution that multicast/unicast?

The scenario in which Unicast/Multicast is more viable is given below. Its picked from here:
As the number of nodes in a server cluster increases, the node-to-node communication rises significantly. In Windows Server 2003, Enterprise Edition or Windows Server 2003, Datacenter Edition, for server clusters with 3 or more nodes, multiple unicast messages for two classes of intracluster traffic are replaced by single multicast messages. This reduces the intracluster traffic, resulting in lower network bandwidth consumption and improved node performance. The configuration of the server cluster must meet the following conditions for heartbeat multicasting:

• The number of nodes that are members of the server cluster (rather than the number of nodes that are currently up and actively participating in the cluster) must be 3 or greater.
• All the nodes in the server cluster must be running Windows Server 2003, Enterprise Edition or Windows Server 2003, Datacenter Edition.


• Both conditions above refer to nodes configured in the cluster membership rather than nodes that are currently up and actively participating in the cluster.
• 2-node server clusters use unicast, not multicast, messaging for all intracluster traffic.
• If you operate a server cluster of 3 or more nodes as a mixed version cluster (that is, Windows Server 2003, Enterprise Edition or Windows Server 2003, Datacenter Edition is installed on some nodes, and Windows 2000 on others), then the cluster as a whole will send unicast, not multicast, intracluster messages.
The two kinds of intracluster traffic affected by heartbeat multicasting are:

• Heartbeat messages sent between nodes.
• Node-to-node communication to verify node failures during cluster configuration changes.

What kind of application is called cluster-aware?

An application is capable of being cluster-aware if it has the following characteristics:
1) It uses TCP/IP as a network protocol.
2) It maintains data in a configurable location.
3) It supports transaction processing.

Leave a Reply

Your email address will not be published. Required fields are marked *

Reload Image