Monday, July 22, 2013

SQL1768N Unable to start HADR



Problem(Abstract)

"SQL1768N Unable to start HADR, Reason Code = 7" occurs while initializing high availability disaster recovery (HADR).

Symptom

This is a generic error message which occurs when the primary database fails to establish a connection to its standby database within the HADR timeout interval.

Cause

There are a number of reasons you may see this error message. Most commonly it is due to:
A.     Network issues
B.     Standby database is not active
C.    Both the servers are not on the same db2level
D.    Due to firewall settings
E.     Mapping to the wrong instance name
F.     HADR_TIMEOUT database configuration value set to a very low value

Resolving the problem

A.     Verify the /etc/hosts and /etc/services configuration to ensure that both machines can ping each other.

Ping
 
B.     Ensure to start the HADR on the standby server as "standby". db2pd command can be used to check whether the database is active or not.

db2pd -db -hadr

Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 00:07:07 -- Date 01/20/2011 11:42:03

If the db2pd output shows that the database is not active, then run "start HADR" command on the standby server.

db2 start hadr on database as standby 
 
C.    Ensure both the servers are on the same db2level so that a mismatch situation does not occur. Run "db2level" command on both the servers to check whether they are on the same DB2 Version and Fix Pack.
 
D.    Ensure that the firewall is set to allow the connection.
 
E.     If the HADR_REMOTE_INST database configuration is incorrectly set, then SQL1768N with RC-7 is expected along with the ADM12504E error. The following message is reported in the db2diag.log of the primary server:

2011-01-20-17.14.12.771000-300        LEVEL: Error
PID     : 3448           TID  : 4488  PROC : db2syscs.exe
INSTANCE: DB2            NODE : 000
EDUID   : 4488           EDUNAME: db2hadrp (SAMPLE) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrVerifySystem, probe:15570
MESSAGE : ADM12504E  Unable to establish HADR primary-standby connection because the value of HADR_REMOTE_INST at one of the instances does not match the actual instance name of the other instance. This is a sanity check to ensure that only the intended database pairing occurs. If any of the HADR_REMOTE_INST configuration parameters or instance names is set incorrectly, you may correct it and try again to start HADR.

To update the correct instance name in both the servers, use the following command:

db2 update db cfg for using HADR_REMOTE_INST
 
F.     Set the HADR_TIMEOUT database configuration to a minimum of 120 seconds(default value).

Note: All HADR db cfg parameters can be changed without an instance stop/start. They do, however, require a database deactivate/activate (if already active).

No comments:

Post a Comment

JMS Messaging - High availability, scalability and Maximo Integration Framework using a single Service Integration Bus

T he first is configuring the JMS resources for scalability and the second is configuring the messaging engines for highly availability. For...