Debugging thread dumps and server problems in ColdFusion MX 6.1 and 7.0

This article describes the generation and diagnosis of thread dumps in the ColdFusion MX Server configuration with the built-in JRun server and ColdFusion MX J2EE configuration with the standalone JRun server. It contains specific examples to show which threads are important and which can be ignored.

The diagnosis information in the article can also be useful for ColdFusion MX J2EE versions running on WebLogic, IBM WebSphere and other supported J2EE platforms. For generating thread dumps on these platforms, refer to vendor-specific documentation.

There are several tools available to the developer to help diagnose performance problems or unresponsive systems. Some of the techniques vary slightly from one operating system to another. Developers need to be aware of these diagnostic aids to help themselves and technical support quickly get to the root cause.

Terminology

It's important to use the correct terminology when describing and diagnosing server problems. For example, crash andhangs are terms often used interchangeably to describe system (application server) state but they are very different and may require slightly different methods to diagnose. Here are some common terms and their definitions.

Crash
The ColdFusion server process stops running/disappears (this could be jrun.exe/jrun or the weblogic/websphere server process). Java itself has crashed and there may be a hotspot log (hs*.log) to help diagnose the problem.
Hung
The ColdFusion server is unresponsive. ColdFusion (the jrun.exe/Windows process) still appears in task manager and doesn't appear to be taking high CPU but is also not serving any new pages. ("ps -ef" command (Unix/Linux))
Spinning
The ColdFusion process is taking high CPU resources (approaching 100%). On Windows 2 CPU systems, this would be 50% CPU usage for jrun.exe.
Thread Dump
A list of all running Java threads in a ColdFusion server. For JRun-based systems, "simultaneous request" threads in thread dumps are prefixed with either jrpp-nn (External Web Server) or web-nn (JRun Web Server). Some others are scheduler-nn, jndi-nn, skimmer-nn.
Stack Trace
An individual thread's execution stack (snapshot) that contains the call stack at the time the thread dump was taken or when an exception was thrown (in the case where the stack is found in the ColdFusion log files, such as exception.log). Some use the terms 'stack trace' and 'thread dump' interchangeably but they are not the same (even though some Sun docs use them interchangeably). This often causes confusion.

Also review ColdFusion logs (application.log, server.log, exception.log, etc.) to help diagnose problems along with using thread dumps.

Upgrade the ColdFusion server to the latest supported patches

Before attempting to debug problems, check the ColdFusion support site for the latest hot fixes and updaters for ColdFusion MX 6.1 or 7.0. It's always best to run the latest version and patches.

Generating a full thread dump

Follow the steps below to capture a full thread dump when the system is in an unresponsive state. Each full thread dump is a snapshot of every thread in the system at that moment in time.

Windows (any configuration): When running as a Windows service (Recommended)

(requires the process "log on" parameters with system-level privileges)

  1. On the "Log On" tab of the ColdFusion MX Application Server Windows service, check "Allow service to interact with Desktop".



    For example: "ColdFusion MX 7 Application Server" or "Macromedia JRun cfusion server", etc.



    When the service starts, this will spawn a DOS window labeled jrunsvc.exe.
  2. At the time you are having performance problems or ColdFusion appears to be unresponsive, get a thread dump to see where the underlying problem is. To trigger the thread dump capture, set focus to the DOS window by clicking on the title bar a few times and press Control+[Pause|Break]. This will cause ColdFusion to print a full thread dump to the file [servername]-out.log.



    BE CAREFUL: Some users on the first try press Control+C by habit and kill the JRun process. The [Pause|Break] key is typically in the upper right area of the keyboard next to the Scroll Lock key. Also, make sure Task Manager isn't running in front, taking focus of the DOS window.



    Take at least 2-3 thread dumps, 20 seconds apart. This captures what each thread is doing over time so it's possible to diagnose whether individual threads are progressing or stuck. A stuck thread (or a long running one) is one that shows the same line number executing over 30 or more seconds in the same jrpp- thread.



    Note: View [servername]-out.log with a browser rather than viewing it with Notepad or Wordpad since it has no carriage return characters. Use "Open With" from Windows Explorer and choose "Internet Explorer" or "Netscape/Firefox/Mozilla" to view the file.
  3. Diagnose the thread dump using the steps in the Reading the thread dump portion of this article.
  4. To terminate ColdFusion MX, shut down the Windows service.

Windows (Server configuration): When running from a DOS window

  1. Start ColdFusion using cfstart.bat in cf_root/bin. Re-direct the output of this command to a file, for example:



    c:\CFusionMX\bin> cfstart > out.txt



    There will be a DOS window labeled "Command Prompt - cfstart" displayed while ColdFusion runs.
  2. At the time you are having performance problems or ColdFusion appears to be unresponsive, get a thread dump to see where the underlying problem is. To trigger the thread dump capture, set focus to the DOS window by clicking on the title bar a few times and press Control+[Pause|Break]. This will cause ColdFusion to print a full thread dump to the file out.txt.



    BE CAREFUL: Some users on the first try press Control+C by habit and kill the JRun process. The [Pause|Break] key is typically in the upper right area of the keyboard next to the Scroll Lock key. Also, make sure Task Manager isn't running in front, taking focus of the DOS window.



    Take at least 2-3 thread dumps 20 seconds apart. This captures what each thread is doing over time so it's possible to diagnose whether individual threads are progressing or stuck. A stuck thread (or a long running one) is one that shows the same line number executing over 30 or more seconds in the same jrpp- thread.



    Note: View out.txt with a browser rather than viewing it with Notepad or Wordpad since it has no carriage return characters. Use "Open With" from Windows Explorer and choose "Internet Explorer" or "Netscape/Firefox/Mozilla" to view the file.
  3. Diagnose the thread dump using the steps in the Reading the thread dump portion of this article. s
  4. To terminate ColdFusion, press Control-C twice in the DOS window.

Windows (J2EE Configuration with JRun):

  1. DOS Window. Start JRun from the command line and redirect the output to a file, for example:



    c:\jrun4\bin\jrun -start cfusion > out.txt



    There will be a DOS window labeled "c:\Windows\System32\cmd.exe - jrun -start cfusion" displayed while ColdFusion runs.
  2. Windows Service. Start the 'Macromedia JRun CFusion Server' service (or the service name of the JRun server affected). On the "Log On" tab of the service, check "Allow service to interact with Desktop". This will spawn a DOS window when the service starts up labeled jrunsvc.exe.



    Regardless of which method (1 or 2 above) used, the DOS Window that JRun runs in is the one the user must interact with to obtain a thread dump.
  3. At the time you are having performance problems or ColdFusion appears to be unresponsive, get a thread dump to see where the underlying problem is. To trigger the thread dump capture, set focus to the DOS window by clicking on the title bar a few times and press Control+[Pause|Break]. This will cause ColdFusion to print a full thread dump to either the file out.txt (if JRun starts from a DOS Window) or[servername]-out.log (if JRun starts from a Windows service)



    BE CAREFUL: Some users on the first try press Control+C by habit and kill the JRun process. The [Pause|Break] key is typically in the upper right area of the keyboard next to the Scroll Lock key. Also, make sure task manager isn't running in front taking focus of the DOS window.



    Take at least 2-3 thread dumps 20 seconds apart. This captures what each thread is doing over time so it's possible to diagnose whether individual threads are progressing or stuck. A stuck thread (or a long running one) is one that shows the same line number executing over 30 or more seconds in the same jrpp- thread.



    Note: View out.txt or[servername]-out.log with a browser rather than viewing it with Notepad or Wordpad since it has no carriage return characters. Use "Open With" from Windows Explorer and choose "Internet Explorer" or "Netscape/Firefox/Mozilla" to view the file.
  4. Diagnose the thread dump using the steps in the Reading the thread dump portion of this article.
  5. When you are ready to terminate ColdFusion, open a new DOS window, and from the cf_root\bin directory, entercfstop to stop the default server.

Unix/Linux (Server configuration)

  1. Start the ColdFusion process in the normal manner:



    cf_root/bin/coldfusion start
  2. Obtain the ColdFusion process ID by issuing:



    ps -ef | grep cfusion (Unix and newer Linux)

    pstree -p | grep cfusion (older Linux)



    Note: For ColdFusion MX 7 use 'grepcfmx7', rather than 'grepcfusion' for Linux and Unix.



    Solaris (Unix) example (Coldfusion MX 7 example shown)

    The process ID (pid) in this case is 23152: (do not use the -autorestart process ID)



    # ps -ef | grep cfmx7 nobody 23150             ...  /opt/coldfusionmx7/bin/cfmx7 -jar cfusion.jar -autorestart -start coldfusion nobody 23152 23150 ... /opt/coldfusionmx7/bin/cfmx7 -jar cfusion.jar -start coldfusion
    Older Linux example (ColdFusion MX 6.1 example shown)

    Older Linux process listings look different. The process ID to use for the kill -3 is highlighted below as the SECOND column.



    # pstree -p | grep cfusion |-cfusion(22070)---cfusion(22073)---cfusion(22074)-+-cfusion(22075) | |-cfusion(22076) | |-cfusion(22077)
    Newer Linux example (Red Hat 3/4) (identical to Unix, ColdFusion MX 7 example shown)

    # ps -ef | grep cfmx7 nobody   2600   ...  /opt/coldfusionmx7/bin/cfmx7 -jar cfusion.jar -autorestart -start coldfusion nobody   2602 2600  .../opt/coldfusionmx7/bin/cfmx7 -jar cfusion.jar -start coldfusion
    Using the "cfusion" process ID in BOLD using the various examples above (or "cfmx7" if ColdFusion MX 7), enter:



    kill -3 {pid}



    A "kill -3" causes ColdFusion to print a full thread dump in thecf_root/logs/cfserver.log file. The file can be viewed with most Unix editors like "vi" because, unlike Windows, Unix does not require carriage return characters, only linefeed characters.



    Diagnose the thread dump using the steps in the Reading the thread dump portion of this article.

Unix/Linux (J2EE configuration with JRun)

  1. Open a terminal window.
  2. Change to the jrun_root/bin directory:



    #cd /opt/jrun4/bin
  3. Start the deployed ColdFusion server instance with the following command (ex: server "production1"):



    ./jrun -start production1 > thread_dump.log



    "> thread_dump.log" redirects system output (and the thread dump) to thread_dump.log in the /opt/jrun4/bin directory rather than to the standard output logfile (usually servername-out.log).
  4. Leave the first terminal window open. Open a new terminal window and get a process listing for the production1 instance:
    #ps -ef | grep production1 [root@qajrun bin]#ps -ef | grep production1 root 10388 9638 32 11:52 pts/1 00:00:08 ./jrun -start production1 root 10389 10388 0 11:52 pts/1 00:00:00 ./jrun -start production1 .. [ could be 30 or more PIDs] .. root 10437 10389 0 11:53 pts/1 00:00:00 ./jrun -start production1 root 10439 10297 0 11:53 pts/2 00:00:00 grep production1
  5. Select the lowest numbered PID (10388 in this example) and execute the following kill command on it:



    # kill -3 10388



    This will generate a full thread dump in thread_dump.log. Execute the "kill -3" 2-3 times at 20 second intervals to capture whether successive thread dumps show a particular jrpp-nn thread either:



    1. progressing (different line number in same template or completely different template over several thread dumps)



      OR



    2. stuck (same template, same line number in subsequent thread dumps). Could also just be long-running.
    Diagnose the issue using the steps in the Reading the thread dump portion of this technote.

What do the various "states" of the threads mean?

Threads in a dump typically have the following states:

runnable
Means the thread is running or ready to run. Threads that have stack traces with .cfc or .cfm should be inspected.



Others without .cfc/.cfm in the stack are often in harmless wait states and can be ignored.
waiting for monitor entry
Waiting to enter a synchronized code block. Rare to see this. Synchronize blocks should execute quickly and if the code takes long enough to end up in a thread dump or more than one, it's worth investigating.
waiting on monitor
Generally can be ignored if the top of the stack shows an Object.wait() method
in Object.wait()
Waiting for an object. Sometimes this means idle and can be ignored but if the stack trace for the Object.wait() thread has a running .cfm/.cfc, investigate if this is blocking with another thread.

Thread names and their relevance to reading the thread dump

Most threads listed in the thread dump are related to system activities, not user code and can be ignored while analyzing thethread dump. These include: jndi-nn,skimmer-nn, jms-fifo-nn,Transaction Domain*, RMI*,Finalizer, and Reference Handler.

The jrpp-nn (simultaneous request threads) andweb-nn (JRun Web Server) are the primary ones to look at. scheduler-nn, and Thread-nn threads are the second group that are sometimes relevant if not in Object.wait(). Thread-nn threads can be spawned by things such as database driver code.

Reading the thread dump

Open thread_dump.log, out.txt or[servername]-out.log in a web browser and search for the phrase "Full thread dump:".

For example: Full thread dump Java HotSpot(TM) Server VM (1.4.2_05-b04 mixed mode):

There will be one instance of the phrase for each thread dump generated. To avoid confusion, it's best to work from the bottom up and search for "Full Thread Dump" and take the LAST thread dump into a separate file so text searches don't go across thread dumps - at least initially. Later, to determine whether a thread (jrpp-28) is "progressing", the user will need to search across multiple thread dumps.

If Control+[Pause|Break] or "kill -3" (Unix/Linux) was executed since first examining the thread dump file, refresh the browser and search for the last occurrence of "Full thread dump:". The list of threads is below the "Full thread dump:" line. Each thread has a stack (aka stack trace). Unfortunately, there is no timestamp.

There are generally a large number of threads. The important ones typically are the jrpp-nn threads, especially the ones containing function names likecoldfusion.runtime.xxxxxx ormacromedia.jdbc.xxxxx. If using the JRun Web Server (on port 8300 or 8500), threads are namedweb-nn.

jrpp-nn threads apply to commercial web servers such as Apache, Microsoft IIS and SunOne using the ColdFusion MX connector (aka the JRun connector).

Note: The number of available Worker Threads (jrpp) is dynamic, based on recent activity. There may be more Worker Threads than the number of "Simultaneous Requests" set in the Administrator. Only "Simultaneous Requests" number of these can actively be executing ColdFusion templates. The extra threads reflect both available threads and incoming ColdFusion requests which are queued up for processing and must wait their turn before the template executes.

Thread dumps can be large. If the ColdFusion MX logs show the same error over and over, try to match it up with a thread inside the thread dump file. If it matches, take a snippet of one of the stack traces in the thread dump and save it for technical support. Always send the entire thread dump file for analysis to support.

Typical thread entry for a request

1 "jrpp-19" prio=5 tid=0x24d35bb0 nid=0x100 waiting on monitor [0x2af9f000..0x2af9fdbc]2 at coldfusion.runtime.Cast._Object(Unknown Source) 3 at cflongrun2ecfm320255489.runPage(C:\Inetpub\wwwroot\longrun.cfm:12) 4 at coldfusion.runtime.CfJspPage.invoke(Unknown Source) <- the lines below "invoke" are much the same with running templates 5 at coldfusion.tagext.lang.IncludeTag.doStartTag(Unknown Source)6 at coldfusion.filter.CfincludeFilter.invoke(Unknown Source) 7 at coldfusion.filter.ApplicationFilter.invoke(Unknown Source) 8 at coldfusion.filter.PathFilter.invoke(Unknown Source) 9 at coldfusion.filter.ExceptionFilter.invoke(Unknown Source) 10 at coldfusion.filter.BrowserDebugFilter.invoke(Unknown Source) 11 at coldfusion.filter.ClientScopePersistenceFilter.invoke(Unknown Source) 12 at coldfusion.filter.BrowserFilter.invoke(Unknown Source) 13 at coldfusion.filter.GlobalsFilter.invoke(Unknown Source) 14 at coldfusion.filter.DatasourceFilter.invoke(Unknown Source) 15 at coldfusion.CfmServlet.service(Unknown Source) 16 at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106) 17 at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42) 18 at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:241) 19 at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:527) 20 at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:198) 21 at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:348) 22 at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:451) 23 at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:294) 24 at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

From line 3, notice this thread is executing the templatelongrun.cfm at line 12.

The calls from various JRun functions to the CfmServlet and to the coldfusion.filter functions will always be present when running a template. The classname cflongrun2ecfm320255489 comes from the filename that longrun.cfm was compiled into (if the option to 'Save Class Files' is enabled in the ColdFusion MX Administrator). The .class file for longrun.cfm is \WEB-INF\cfclasses\cflongrun2ecfm320255489.class. This name is constructed by prepending "cf" to the filename with non-alphanumeric characters replaced by their hexadecimal equivalent. For example; the dot (.) is 2e in hexadecimal. Directory separators are replaced with two underscores "__". The trailing number is a Java-supplied hash code.

The lines highlighted in RED are the important ones. They begin at an executing .cfm or .cfc and any lines above (lines 2 and 3 in the example above). The lines below the .cfc and .cfm line - generally - describe the context of the running template. For example, if macromedia.jdbc.sqlserver ormacromedia.jdbc.oracle appears below the .cfm or .cfc, we know there is a database interaction (cfquery,cfstoredproc, etc).

In the example above, the lines in BLUE - or some subset of them depending on the request - usually aren't relevant to figuring out the problem so it's important not to be distracted by them.

Deadlocks

Deadlocks can be caused by one thread holding a resource needed by another thread or group of threads. This is the so-called thread wait pileup and can cause dozens of successive "timeout waiting for an available thread to run" type errors in the logs. A thread dump with a stuck system will often reveal one thread causing the problem. The root cause could be user code, code in the core ColdFusion or JRun engine or some outside influence such as a database problem, network issue, etc.

Thread dump examples

Example 1: 'Simultaneous request' threads waiting for incoming request (can be ignored typically)

Symptom: Steady state. These are often the threads you can ignore while reading a thread dump. (ColdFusion MX 7 examples shown)

"jrpp-8" prio=5 tid=0x0969be18 nid=0x654 runnable [dc0f000..dc0fdb0] at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(Unknown Source) - locked <0x1907d490> (a java.net.PlainSocketImpl) at java.net.ServerSocket.implAccept(Unknown Source) at java.net.ServerSocket.accept(Unknown Source) at jrun.servlet.network.NetworkService.accept(NetworkService.java:356) at jrun.servlet.jrpp.JRunProxyService.accept(JRunProxyService.java:104) at jrun.servlet.jrpp.JRunProxyService.createRunnable(JRunProxyService.java:120) at jrunx.scheduler.ThreadPool$DownstreamMetrics.createRunnable(ThreadPool.java:285) at jrunx.scheduler.ThreadPool$ThreadThrottle.createRunnable(ThreadPool.java:347) at jrunx.scheduler.ThreadPool$UpstreamMetrics.createRunnable(ThreadPool.java:239)

or this:

"jrpp-4" prio=5 tid=0x71643c18 nid=0xae4 in Object.wait() [d50f000..d50fdb0] at java.lang.Object.wait(Native Method) - waiting on <0x19071950> (a jrunx.scheduler.JSemaphore) at jrunx.scheduler.JSemaphore.acquire(JSemaphore.java:74) - locked <0x19071950> (a jrunx.scheduler.JSemaphore) at jrun.servlet.network.NetworkService.accept(NetworkService.java:348) at jrun.servlet.jrpp.JRunProxyService.accept(JRunProxyService.java:104) at jrun.servlet.jrpp.JRunProxyService.createRunnable(JRunProxyService.java:120) at jrunx.scheduler.ThreadPool$DownstreamMetrics.createRunnable(ThreadPool.java:285) at jrunx.scheduler.ThreadPool$ThreadThrottle.createRunnable(ThreadPool.java:347) at jrunx.scheduler.ThreadPool$UpstreamMetrics.createRunnable(ThreadPool.java:239)

Example 2: JRun Web Server idle thread (can be ignored typically)

Symptom: Steady state

"web-0" prio=5 tid=0x38e72c48 nid=0xb90 runnable [3cfdf000..3cfdfdb0] at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(Unknown Source) - locked <0x13a6ca70> (a java.net.PlainSocketImpl) at java.net.ServerSocket.implAccept(Unknown Source) at java.net.ServerSocket.accept(Unknown Source) at jrun.servlet.network.NetworkService.accept(NetworkService.java:356) at jrun.servlet.http.WebService.createRunnable(WebService.java:104) at jrunx.scheduler.ThreadPool$DownstreamMetrics.createRunnable(ThreadPool.java:285) at jrunx.scheduler.ThreadPool$ThreadThrottle.createRunnable(ThreadPool.java:347) at jrunx.scheduler.ThreadPool$UpstreamMetrics.createRunnable(ThreadPool.java:239) at jrunx.scheduler.WorkerThread.run(WorkerThread.java:62)

Example 3: ColdFusion MX 6.1 Thread Pooling issue - deadlock causing server hang

Symptom: All threads (requests) hang and eventually timeout. All versions that use JRun4 Updater 5 or above (CF 7.0.1) have this fixed. 6.1 updater has a post 6.1 Updater patch (57510) for this problem. If using 6.1, apply BOTH the 6.1 Updater and the 57510 hot fix patch.

"web-nn" prio=5 tid=0x3aadae30 nid=0x14ec in Object.wait() [417f000..417fdbc]<- this one shows a JRun Web Server thread at java.lang.Object.wait(Native Method)- waiting on <0x18689fb0> (a jrunx.scheduler.WorkerThread) at jrunx.scheduler.StackMutex.enter(StackMutex.java:47) - locked <0x18689fb0> (a jrunx.scheduler.WorkerThread)
"jrpp-nn" prio=5 tid=0x6c9dcc60 nid=0xb74 in Object.wait() [905f000..905fdb8]<- here's one attached to a commercial web server at java.lang.Object.wait(Native Method)- waiting on <0x10b30018> (a jrunx.scheduler.WorkerThread) at jrunx.scheduler.StackMutex.enter(StackMutex.java:47) - locked <0x10b30018> (a jrunx.scheduler.WorkerThread) at jrun.servlet.network.NetworkService.accept(NetworkService.java:356) at jrun.servlet.jrpp.JRunProxyService.accept(JRunProxyService.java:104) at jrun.servlet.jrpp.JRunProxyService.createRunnable(JRunProxyService.java:120) at jrunx.scheduler.ThreadPool$DownstreamMetrics.createRunnable(ThreadPool.java:315) at jrunx.scheduler.ThreadPool$ThreadThrottle.createRunnable(ThreadPool.java:377) at jrunx.scheduler.ThreadPool$UpstreamMetrics.createRunnable(ThreadPool.java:269)

Example 4: Idle waiting to run when a request comes in. (can be ignored typically)

Symptom: Steady state. jrpp-1has finished a request and is waiting for another one from the connector on the same connection.

"jrpp-1" prio=5 tid=0x39160680 nid=0x664 runnable [3d73f000..3d73fdbc] at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:183) at java.io.BufferedInputStream.read1(BufferedInputStream.java:222) at java.io.BufferedInputStream.read(BufferedInputStream.java:277) - locked <0x13cb0a68> (a jrun.servlet.io.ReusableBufferedInputStream) at jrun.servlet.jrpp.ProxyEndpoint.readFully(ProxyEndpoint.java:530) at jrun.servlet.jrpp.ProxyEndpoint.readFully(ProxyEndpoint.java:522) at jrun.servlet.jrpp.ProxyEndpoint.readInt(ProxyEndpoint.java:540) at jrun.servlet.jrpp.ProxyEndpoint.readRequest(ProxyEndpoint.java:229) at jrun.servlet.jrpp.JRunProxyService.swapRunnable(JRunProxyService.java:144) at jrunx.scheduler.ThreadPool$DownstreamMetrics.swapRunnable(ThreadPool.java:320) at jrunx.scheduler.ThreadPool$ThreadThrottle.swapRunnable(ThreadPool.java:433) at jrunx.scheduler.ThreadPool$UpstreamMetrics.swapRunnable(ThreadPool.java:280) at jrunx.scheduler.WorkerThread.run(WorkerThread.java:76)

Example 5: socketRead() - waiting for database response or processing a database response

Symptom: Slow response time. (may or may not be normal - see below)

This thread is executing either a cfquery orcfstoredproc tag at line 13 of productsearch.cfm. Most often, users will be looking at debugging problems that have their root cause in some kind of database action. The macromedia.jdbc.sqlserver.* (or oracle/informix/sequelink, etc) in the stack tells us it's a database action. Given that most applications use cfquery and queries could take a few seconds, it's not unusual to see this signature in a thread dump.

In this case, web-0 is waiting for a response from the database or processing a database response - thesocketRead0() call - so this might be normal or it could be a problem if the SPECIFIC jrpp-nn - or in this case the JRun Web Server "web-0" thread - doesn't progress over time (2-3 thread dumps).

There are also situations where the database driver could cause a hang and show this thread dump signature but GENERALLY, most application thread dumps could have a few of these and be perfectly normal.

"web-0" prio=5 tid=0x03f98838 nid=0xb78 runnable [62bf000..62bfdb4] at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at macromedia.jdbc.sqlserver.SQLServerByteOrderedDataReader.readPacketIntoPrimaryBuffer(Unknown Source) at macromedia.jdbc.sqlserver.SQLServerByteOrderedDataReader.receive(Unknown Source) at macromedia.jdbc.sqlserver.tds.TDSExecuteRequest.submitRequest(Unknown Source) at macromedia.jdbc.sqlserver.SQLServerImplStatement.execute(Unknown Source) at macromedia.jdbc.base.BaseStatement.commonExecute(Unknown Source) at macromedia.jdbc.base.BaseStatement.executeInternal(Unknown Source) at macromedia.jdbc.base.BaseStatement.execute(Unknown Source) - locked <0x12f1c4c0> (a macromedia.jdbc.sqlserver.SQLServerConnection) at jrun.sql.JRunStatement.execute(JRunStatement.java:349) at coldfusion.sql.Executive.executeQuery(Executive.java:974) at coldfusion.sql.Executive.executeQuery(Executive.java:860) at coldfusion.sql.SqlImpl.execute(SqlImpl.java:229) at coldfusion.tagext.sql.QueryTag.doEndTag(QueryTag.java:447)at cf534862ecfm1458941674.runPage(C:\JRun4\servers\cfusion\cfusion-ear\cfusion-war\productsearch.cfm:13) at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:147) .. ..

When there are database related issues and the server is unresponsive, it's common to see multiple threads showing the same CFQUERY line running (productsearch.cfm:13) and perhaps another, single thread accessing or updating the same table (productupdate.cfm:45). This might indicate a deadlock somewhere in the database. Run the SQL code outside ColdFusion MX to see if it's something other than a ColdFusion MX problem. For example, an index may be corrupt or missing.

Example 6: Purging client variables in the Windows registry causes high memory/system hang

Symptom: CPU spins at 100% - and high memory usage - about every hour after startup (large memory use too).

Here's a common one that some fall into going from development to production. The default for storing client variables is "REGISTRY". Once the number of records in the registry is large, the query to get all records and delete expired client records can take 100% CPU for minutes at a time.

Registry client storage should never be used for production systems but often developers fall into this by accident by not explicitly specifying a data source (and the ColdFusion admin defaulting to "REGISTRY"). Since the registry isn't a real database, ColdFusion has to retrieve the entire registry client tree (high memory usage) and compare the date/time one at a time to decide whether to purge a record. This is a CPU-intensive operation with an hourly purge that may only delete a few records.

Since this is a system task, it's running in a scheduler thread instead of a jrpp thread. Here's a case where a scheduler thread is relevant.

"scheduler-2" prio=5 tid=0x393a3938 nid=0x844 runnable [4473f000..4473fdb8]at coldfusion.tagext.NativeCfx.processRequest(Native Method) at coldfusion.tagext.lang.RegistryTag.doStartTag(RegistryTag.java:90) at coldfusion.runtime.ClientScopeServiceImpl.CheckClient(ClientScopeServiceImpl.java:250)at coldfusion.runtime.ClientScopeServiceImpl.PurgeRegistry(ClientScopeServiceImpl.java:352) at coldfusion.runtime.ClientScopeServiceImpl.PurgeExpiredClients(ClientScopeServiceImpl.java:390) at coldfusion.runtime.ClientScopeServiceImpl$ClientScopePurger.run(ClientScopeServiceImpl.java:499) at coldfusion.scheduling.ThreadPool.run(ThreadPool.java:201) at coldfusion.scheduling.WorkerThread.run(WorkerThread.java:70)

Note: If using a 2 processor system on Windows, task manager would show a processor at 50% utilization which is the equivalent of 100% CPU on a single CPU system.

Example 7: Refreshing of cached query - - causes a slowdown or hang

Symptom: Requests back up and request backlog is high. Server can either seem hung or unresponsive. Memory usage is not high.

A hang long enough to record a cached query refresh in the thread dump may have several associated issues.

  • Is the user bringing back a huge query? The user might not think so if using <cfquery MaxRows=>. If the query has a large number of rows, all other pages that hit the cached query will block until the first user hitting the query (after the <cfquery cachedWithin=..> time limit value) gets a resultset from the database (the "Waiting for monitor entry" below).
  • <cfquery maxrows=N> bug. ColdFusion MX (until ColdFusion MX 7.0.1 CHF2) didn't pass maxrows to the underlying driver (statement.setMaxRows()) so it's possible that ColdFusion would bring back the entire resultset first, then only show "N" number of rows. This could delay the query or use large amounts of memory and could contribute to the slowdown. ColdFusion 5 did respect MaxRows so some users migrating to 6.1 might see this performance hit.

Without the thread dump, the user might change the<cfquery cachedWithin=..> time limit value of the cached query to make it SOONER which only makes the problem worse.

N jrpp (Simultaneous Request) threads blocked at this point.

"jrpp-643" prio=5 tid=0x045ad210 nid=0x1a70 waiting for monitor entry [3cb0e000..3cb0fdc0]at coldfusion.tagext.sql.QueryTag.setupCachedQuery(QueryTag.java:540)<- N threads waiting for query to be refreshed - waiting to lock <0x13b5fae8> (a coldfusion.sql.Executive) at coldfusion.tagext.sql.QueryTag.doEndTag(QueryTag.java:443)at cfBaseClass2ecfc1402536326$funcLOAD.runFunction(D:\Inetpub\wwwroot\BaseClass.cfc:113) at coldfusion.runtime.UDFMethod.invoke(UDFMethod.java:338)

+ some other request thread in runnable - the one trying to refresh the cached query.

"jrpp-641" prio=5 tid=0x08358408 nid=0x1278 runnable [570e000..570fdc0]at java.net.SocketInputStream.socketRead0(Native Method)<- 1st user in after refresh timeout waiting for a database response at java.net.SocketInputStream.read(SocketInputStream.java:129)  to refresh the cached query at macromedia.jdbc.sqlserver.SQLServerByteOrderedDataReader.readPacketIntoPrimaryBuffer(Unknown Source) at macromedia.jdbc.sqlserver.SQLServerByteOrderedDataReader.receive(Unknown Source) at macromedia.jdbc.sqlserver.tds.TDSExecuteRequest.submitRequest(Unknown Source) at macromedia.jdbc.sqlserver.tds.TDSRequest.execute(Unknown Source) at macromedia.jdbc.sqlserver.SQLServerImplStatement.execute(Unknown Source) at macromedia.jdbc.base.BaseStatement.commonExecute(Unknown Source) at macromedia.jdbc.base.BaseStatement.executeInternal(Unknown Source) at macromedia.jdbc.base.BaseStatement.execute(Unknown Source) - locked <0x18e92ee0> (a macromedia.jdbc.sqlserver.SQLServerConnection) at coldfusion.server.j2ee.sql.JRunStatement.execute(JRunStatement.java:212) at coldfusion.sql.Executive.executeQuery(Executive.java:974) at coldfusion.sql.Executive.executeQuery(Executive.java:886) at coldfusion.sql.SqlImpl.execute(SqlImpl.java:236) at coldfusion.tagext.sql.QueryTag.setupCachedQuery(QueryTag.java:595) at coldfusion.tagext.sql.QueryTag.doEndTag(QueryTag.java:443)at cfBaseClass2ecfc1402536326$funcLOAD.runFunction(D:\Inetpub\wwwroot\BaseClass.cfc:113) ..

ColdFusion MX 6.1 workaround for CFQUERY MaxRows issue

Use database-specific row limit hints. Ex: For SQL Server:

  1. Use ROWLIMIT
    <CFQUERY maxrows=50 cachedwithin=...> set ROWLIMIT 50 select id, content from content_table ...</CFQUERY>
    or



  2. Use the SQL Server-specific syntax
    <CFQUERY maxrows=50 cachedwithin=...> select TOP 50 id, content from content_table ...</CFQUERY>

Example 8: ColdFusion MX 7.0.1 (Standard only) CFDOCUMENT hang - eventually hangs server.

Symptom: One CFDOCUMENT thread throws an error and all the other CFDOCUMENT threads hang behind it eventually using up all simultaneous requests and hanging. (Fixed in ColdFusion MX 7.0.1 CHF2). This one can't really be diagnosed without source code but seeing 8 simultaneous requests at line 3 of the same template in a thread dump is suspicious (or in this case at CFDOCUMENT calls doing PDF generation). This can help support begin to diagnose the problem.

"jrpp-1" prio=5 tid=0x390ea688 nid=0xeb4 in Object.wait() [3ef2f000..3ef2fdb0] at java.lang.Object.wait(Native Method)- waiting on <0x15594c58> (a coldfusion.util.Semaphore) at java.lang.Object.wait(Unknown Source) at coldfusion.util.Semaphore.acquire(Semaphore.java:31) - locked <0x15594c58> (a coldfusion.util.Semaphore) at coldfusion.tagext.lang.DocumentTag.doStartTag(DocumentTag.java:1091) at cf45DayFax2ecfm1238155023.runPage(D:\mysites\web\myReport.cfm:3) at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:152) at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:349) at coldfusion.filter.CfincludeFilter.invoke(CfincludeFilter.java:65) at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:210) at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:51) at coldfusion.filter.PathFilter.invoke(PathFilter.java:86) at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:69) at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java: at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38) at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38) at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22) at coldfusion.filter.RequestThrottleFilter.invoke(RequestThrottleFilter.java:115) at coldfusion.CfmServlet.service(CfmServlet.java:107) at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:78) at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:91) at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42) at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:257) at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:541) at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:204) at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:318) at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:426) at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:264) at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

Get help faster and easier

New user?