Monday, February 02, 2009

Troubleshooting hung ColdFusion Server

We recently upgraded from CFMX 6.1 to ColdFusion 8 Enterprise Edition. CF8 has some really cool tools - one such is the Server Monitor - this tool allows you to see whats going on below the surface in CF. You can see which requests are running, which queries are running, how much memory is being used by different threads & sessions + a whole lot of other things.

Immediately after upgrading to CF8, we started seeing problems with the server - using the Server Monitor we found that there were some database connections that were still open on CF.



But pretty soon, we started having hung CF servers again. This would happen 4-5 times a day. There were no issues on the DB and we were not even able to get the CF Server Monitors up (as the server monitors are themselves CF app). Also, the Windows performance monitor showed only a couple of CF threads running. So it seemed that the 3-4 CF threads that were running were causing the server to go down.

So I decided to take a look at what these 3-4 CF threads were. For this I had to take thread dumps of when we were seeing these problems.

To do so, I followed the following steps:
  • Enabled "Allow service to interact with desktop" for the CF service in Windows Services panel.












  • Our CF server is hosted at a remote location. So I use Remote Desktop to log on to the server and see the desktop. But when I directly logged on to the server, I did not see the console (as CF is started using the system account).
  • So to see the console, I Opened command prompt on my local machine and entered:
    c:\mstsc -v:xxx.xxx.xx.xxx /F -console (This did not work from Vista; but did work from Windows XP)
  • This will open a Remote Desktop window to the CF server
  • Enter the Admin password for the server and login.
  • Once logged in, you will see a blank console window on the server. This console window will have the title "c:\cfusion8\runtime\bin\jrun.exe" (or something similar)
  • With this window selected (highlighted) , hit Ctrl+(Pause/Break key) - this key is on top right side of keyboard
  • This will generate a dump in the file c:\cfusion8\runtime\logs\coldfusion-out.log
  • Repeat above step a couple of times in 20-30 second intervals. This will give you diff. dumps at 20-30 sec intervals and will help you better understand whats going on.
In my case, we found that the problem was a Java deadlock caused by 3 threads. Of these 3 threads, 2 were threads related to the ColdFusion Server Monitor.
The specific entries were:
Found one Java-level deadlock:=============================
"jrpp-68":
waiting to lock monitor 0x60968554 (object 0x0db33628, a
java.util.Hashtable),
which is held by "scheduler-0"

"scheduler-0":
waiting to lock monitor 0x609291cc
(object 0x0d4dcd10, a
coldfusion.monitor.memory.SessionMemoryMonitor$TopMemoryUsedSessions),
which is held by "jrpp-47"

"jrpp-47":
waiting to lock
monitor 0x60968554 (object 0x0db33628, a java.util.Hashtable),
which is held by "scheduler-0"
This was followed by the details of the 3 stacks. The details clearly showed us which CF pages were involved.
One lesson we learned was that it is not prudnet to run the Server Monitor on a production machine very often. If you have to run it, run is very sparingly and dont forget to stop the monitors when done.
A very good article on trouble shooting CF can be found at: http://kb.adobe.com/selfservice/viewContent.do?externalId=tn_18339

No comments: