EzDevInfo.com

failovercluster interview questions

Top failovercluster frequently asked interview questions

How to get the status of a JBoss Application Server

I am working on writing a Highly Available agent for JBoss Application Server to run on Solaris Open HA Cluster. As I don't know much of the JBoss AS, can someone please tell me how can I probe the status of the application server?

I want to know the health of the application server, for example whether it is currently running or not.


Source: (StackOverflow)

RedShift Node Failover

I have a RedShift cluster of 4 nodes.

  1. When one of the nodes goes down, will the entire cluster become unavailable?
  2. If yes - for how long?
  3. When the cluster gets back - is it returned to exactly the same point it was before the failure, or the data may be rolled back a to S3 snapshot from a few hours ago?
  4. How can I simulate this situation to check this scenario by myself?

Thanks a lot!


Source: (StackOverflow)

Advertisements

Kubernetes master high availability or replication configuration

Hi all we are looking for practically and tested guide or reference for kubernetes master high availability or other solution for master node fail over.


Source: (StackOverflow)

can't access shared folder of Cluster Server

I have configured a FailOver Cluster Instance (ClusterDB) with two nodes for SQL Server 2014 on Windows 2008 R2 SP 1. It's my first cluster so I'm not very sure of a lot of things, this is new for me so keep this in mind.

I created a shared folder of my backups on the cluster and successfully accessed from my Test Server (Windows Server 2012 and in the same domain). But from a few days now I can't access to this folder, when I log in into my Test Server and try to access to \ClusterDB on windows explorer, throws me an error as it cannot find the machine.

The cluster has an internal IP and a external one for the Server and both are online, if put \InternalIP on my test server, it can connect, but this not shows the shared folder (I suppose it should be here but not sure why not).

From the nodes of the cluster (node 1 and node 2) I can connect to \ClusterDB and see the shared folder.

I checked the permissions for this folder and the main user (user which I log in on my test server) has permission to read this folder

Could anybody help me to fix this?


Source: (StackOverflow)

.NET library for active/passive fail-over clustering

I want to develop an application that connects to some input sources and processes the messages it reads (think BizTalk in principle, but not as heavy). For performance and reliability I would like to enable horizontal scaling of the service, obviously by utilising a shared storage (such as DB) to act as a message queuing mechanism.

However, threads that access resources such as email or disk folder cannot be scaled horizontally. Only one instance must be running at one time reading from that input source. (Further message processing business logic can of course reside on multiple nodes).

This is a perfect candidate for Active/Passive clustering. One node is considered "Active" and actively connects to the "single-instance" resources (such as email inbox), while others are "Passive". If the "Active" node dies, then the other "Passive" nodes elect a new "Active" node among themselves.

Now the question: is there a .NET library out there somewhere which helps one implement the usual failover clustering logic? (i.e. implementing the necessary heartbeat sending/detection, and "active" node election process). As I don't want to reinvent the wheel.

What I can see from the research done already:

  • BizTalk Server supports this functionality natively, but I am not using BizTalk as it's too heavy and expensive (but I want to emulate this functionality of it)
  • Windows Server supports Failover Clustering (in certain high-end versions like Windows Server 2008 Enterprise or Datacenter), but again this is an expensive solution (as each node would need the expensive license)
  • There is a lot of information on how failover algorithm should work, but I cannot see an open source implementation anywhere ... (only in commercial products sold at a premium)

I understand that it might be considered advanced and desirable functionality, and hence why commercial solutions for it are expensive. This is fine - if there is no open-source implementation or library out there, I will develop one on my own. I just don't want to spend the effort it it already exists.

UPDATE 12/02/2011: Found SAForum (http://www.saforum.org/link/linkshow.asp?link_id=214720), which is a website that publishes open specification for developing service availability concepts. There is also OpenSAF (http://www.opensaf.org/Welcome-to-OpenSAF%E2%84%A2~151213~14944.htm), and open-source C++ implementation of specifications on SAForum. Looks comprehensive, but is very heavy. It will take me a lot of time to wade through the specifications and documentation. It also covers a lot more than just fail-over, offering specification for full scalable distributed system (notifications, distributed events, locks, cluster management, etc.) ... Still no sign of a .NET implementation anywhere.


Source: (StackOverflow)

How to ensure java clients continue "working" in case whole hazelcast cluster is down

We are currently preparing hazelcast for going live in the next weeks. There is still one bigger issue left, that troubles our OPs department and could be a possible show stopper in case we cannot fix it.

Since we are maintaining a high availability payment application, we have to survive in case the cluster ist not available. Reasons could be:

  1. Someone messed up the hazelcast configuration and a map on the cluster increases until we have OOM (had this on the test system).
  2. There is some issue with the network cards/hardware that temporary breaks the connection to the cluster
  3. OPs guys reconfigured the firewall and accidentaly blocked some ports that are necessary, whatosoever.
  4. Whatever else

I spent some time on finding good existing solution, but the only solution so far was to increase the number of backup servers, which of course does not solve the case.

During my current tests the application completely stopped working because after certain retries the clients disconnect from the cluster and the hibernate 2nd level cache is no longer working. Since we are using hazelcast throughout the whole ecosystem this would kill 40 java clients almost instantly.

Thus I wonder how we could achieve that the applications are still working in a of course slower manner when the cluster is down. Our current approach is to switch over to ehcache local cache but I think there should be hazelcast solution for that problem as well?


Source: (StackOverflow)

RabbitMQ Client connect to several hosts

The main goal to have several hosts of RabbiMQ servers (clustering) Are there any best practices to implement having several RabbitMQ hosts, and reconnect to the next one in case previous connection closed.

Tutorial says that:

A client can connect as normal to any node within a cluster. If that node should fail, and the rest of the cluster survives, then the client should notice the closed connection, and should be able to reconnect to some surviving member of the cluster. Generally, it's not advisable to bake in node hostnames or IP addresses into client applications

How it can be implemented from the client side ?


Source: (StackOverflow)

Book/Resource about setting up load balancing and fail over for Servlet based Java web application [closed]

We're creating a web system using Java and Servlet technology (actually Wicket for the presentation layer) and we need our system to be available nearly always as our customers will be quite dependent on it.

This has lead us to look for a good book focusing on the subject or another resource which explains how to set up a more redundant and fail safe architecture for our system.

A non exclusive list of questions we have at the moment:

  • How do you have one domain name (like http://www.google.com) which are actually served by several servers with load balancing to distribute the users? Isn't there always a point which is weaker in such a solution(the two [as there can't be more] DNS servers for google.com in their case)?
  • It seems like a good idea to have several database servers for redundancy and load balancing. How is that set up?
  • If one of our web servers goes down we would like to have some kind of fail over and let users use one that is still up. Amongst other things the sessions have to be synchronized in some way. How is that set up?
  • Do we need some kind of synchronized transactions too?
  • Is Amazon Computer Cloud a good option for us? How do we set it up there? Are there any alternatives which are cost effective?
  • Do we need to run in a Java EE container like JBoss or Glassfish?

Source: (StackOverflow)

Why does my SQL Server cluster execute "SELECT @@SERVERNAME" every minute?

When I run a Profiler trace on our SQL Server cluster, I've noticed that it executes "SELECT @@SERVERNAME" every minute. I always figured the Failover Cluster service did something with it, or just used it to confirm that the network name and IP combinations were set up correctly. This doesn't happen every minute on my non-clustered instances, only on those that are part of a failover cluster.

What makes this even more curious is that I overrode the name of a cluster instance using sp_dropserver/sp_addserver, and there were no ill effects at all - even though the clustered instance name doesn't match the expected network/instance combination, both the cluster manager and SQL Server seem totally fine with this.

This just doubly-begs the question - why constantly query it if you're not going to do anything with/about it? Can anybody shed some light on the plumbing here?


Source: (StackOverflow)

Glassfish 3.1.1 cluster session replication

I've got problems with session replication on Glassfish 3.1.1 Open Source edition. There are two physical servers included in on cluster. On the first one there is DAS and instance 1. On the second physical server there is instance 2. Both servers run Windows 7 x64. I am following this tutorial:

http://javadude.wordpress.com/2011/05/12/glassfish-3-1-%E2%80%93-clustering-tutorial-part2-sessions/

As far as I understand when session replication works there should the same session when I visit the web app one both physical instances. Thus, when I log in into instance 1, I should be automatically logged in on instance 2 as well. Is this right?

Does anybody know how to solve this problem?

Thanks in advance.


Source: (StackOverflow)

Active-Active high availability design for Windows messaging service?

I'm trying to figure out what's the best way to design a active-active cluster that uses a replicated database. For network load balancing and failover, I can use Windows NLB. For database, I can use MySQL which can do master-master replication out of the box. This is the simple part.

Now my problem is how to program the messaging service, which is connected to a replicated database. What is the best way to go about designing it so that both services work with the same tables without conflict? On failure, the uncompleted transactions from the failed node must be assumed by the other node.

Here is how the messaging service works. Web clients will call the web service with a recipient and a message. The web service will insert the message into the database queue. When a specific condition is met, the message will be transmitted. This could happen within seconds or after a couple of days.

I've done extensive searches on the Internet but to no avail. Has anyone done something similar? Thanks.


Source: (StackOverflow)

Avoid loosing Session and ViewState data with Glassfish clustering in a JSF application

We are using this configuration in our clustered application :

glassfish-web.xml :

<session-config>
    <session-manager persistence-type="replicated">
        <manager-properties>
            <property name="persistenceFrequency" value="web-method" />
            <property name="relaxCacheVersionSemantics" value="true" />
        </manager-properties>
        <store-properties>
            <property name="persistenceScope" value="session" />
        </store-properties>
    </session-manager>
    <session-properties />
    <cookie-properties />
</session-config>

Session is stored and replicated by Glassfish, the problem is that if something goes wrong with the cluster and it needs to be restarted we are loosing all the Session and ViewState data.

Is there a way to plug an external storage like memcached or mysql to store Session and ViewState information to ensure we never loose our client data?

PS : We are extensively using @ManagedBean @ViewScoped and we really wish to keep view state in a safe place


Source: (StackOverflow)

how can I determine if microsoft failover cluster has a quorum (in powershell)

I'm trying to determine if my microsoft failover cluster has a quorum (in powershell). Cmdlet Get-ClusterQuorum gives me quorum configuration - but I need a state.

Cmdlet Get-Cluster | fl * gives me a lot of cluster properties, but I cannot find there the one I need (DynamicQuorum is a configuration parameters and I would be happy if someone could explain me what FixQuorum and PreventQuorum exactly means, but probably they relate to Start-ClusterNode -FixQuorum command)

Since I have AlwaysOn high availability installed, I can run a query:

select cluster_name, quorum_type_desc, quorum_state_desc from sys.dm_hadr_cluster

and get something like: myclustername,NODE_MAJORITY,NORMAL_QUORUM

and it seems what I need, but how can I get this without SQL? Thanks a lot in advance.


Source: (StackOverflow)

tomcat webapp failover

I am working on the high availability aspect of a webapp deployed in tomcat. I require a mechanism for failover such that it should not be apparent to the webapp user and have am looking at tomcat clustering as a solution for the same.

If I am looking only at failover and not on load balancing(not required at this point) , how should I configure the tomcat cluster ?

EDIT

I am aware about the mechanism but am looking at the configuration aspect.


Source: (StackOverflow)

What election algorithm does microsoft failover cluster use?

I cannot find anything about the algorithm it uses as primary node election algorithm.

http://msdn.microsoft.com/en-us/library/aa373130%28v=vs.85%29.aspx

Is it a bully algorithm, or ring algorithm, or some other algorithms?


Source: (StackOverflow)