The sender and receiver channels between two queue managers (WebSphere MQ v7.1 running on Redhat Linux) that I have configured is going down pretty frequently. Any idea why? How can I debug this? Thanks.
The sender and receiver channels between two queue managers (WebSphere MQ v7.1 running on
Share
Channels are expected to go down. The idea is that they stay active as long as there is traffic and then time out. Assuming they’ve been configured to trigger, the presence of a message on the XMitQ causes the channel to start up again.
The reason for this is that a triggered channel will generally restart if interrupted by a network failure or other adverse event. However if a channel is configured to stay running 24×7 then the only way it stops is due to one of these adverse events and that increases the likelihood that human intervention will be required to restart the channel. On the other hand, a channel that times out can survive all sorts of nasty network events that occur while it is inactive. Allowing it to time out when not in use thus improves overall reliability of the channel.
So how do you cause a channel to trigger? Make sure the transmission queue contains the
TRIGGER,TRIGTYPE,TRIGDATAandINITQattributes. For example, to define a transmission queue to theJUPITERQMgr:The only variable of the bunch is
TRIGDATAwhich contains the name of the channel serving this XMitQ.Of course, the channel initiator must be running but in modern versions of WMQ it starts by default (based on the value of the queue manager’s
SCHINITattribute) so generally will in fact be running.The channel that is in
STOPPEDstate cannot be triggered. By default theSTOP CHLcommand usesSTATUS(STOPPED)so most of the time manually stopping a channel prevents triggering. If you want to stop a channel in such a way that it will restart (for example to test triggering) use theSTOP CHL(CHLNAME) STATUS(INACTIVE)command. If the channel is already inSTOPPEDstate, either issue theSTART CHLcommand to make it start immediately or use theSTOP CHL(CHLNAME) STATUS(INACTIVE)to change the status fromSTOPPEDtoINACTIVEwithout starting it.Once the channels are up, the
DISCINTattribute of the channel determines how long it will run before timing out. The value is in seconds and defaults to 600 which is 10 minutes. TheDISCINT,KAINTandHBINTcombine to determine when the channel comes down. Note that the TCP spec calls for things using keepalive to disable them by default so if you want to use keepalive on your channels, you must enable it in the QMgr tuning as described here.Please see Triggering Channels in the Infocenter for more on the configuration details. Take a look at SupportPac MD0C WebSphere MQ – Keeping Channels Up and Running if you want to know more about the internals and tuning. (The SupportPac is a bit dated but the principles of tuning mostly still apply. Where there are discrepancies, the Infocenter is the authoritative version.)
If you want to keep channels up continuously, set
DISCINT(0)but remember that triggering remains the preferred option. Some shops need to minimize response times during the business day and so setDISCINTto a value that allows the channels to time out at night but generally keeps them running all day. If for some reason you have triggering set up right and the channels go down prior toDISCIINTyou should be able to check in the error logs for the reason why. These reside in the QMgr’s directory undererrors. For example, on UNIX/Linux they are in/var/mqm/qmgrs/qmgrname/errorsand on Windows the default location isC:\Program Files(x86)\WebSphere MQ\QMgrs\qmgrname\errors. Look for the files namedAMQERR??.LOGwhere??=01,02, or03. The logs rotate where01is current,02is next and so on. If you have a very busy QMgr you need to capture these as soon as the channel goes down or they could roll off.