This article describes the generation and diagnosis of thread dumps in the ColdFusion 10 Server configuration with the built-in Tomcat server. It contains specific examples to show which threads are important, and which you can ignore.

The diagnosis information in the article is also useful for ColdFusion 10 J2EE versions running on WebLogic, IBM WebSphere, and other supported J2EE platforms. For generating thread dumps on these platforms, see the vendor-specific documentation.

There are several tools available to developers to help diagnose performance problems or unresponsive systems. Some of the techniques vary slightly from one operating system to another. It's necessary that developers know of these diagnostic aids to help themselves and technical support quickly get to the root cause.

Terminology

It's important to use the correct terminology when describing and diagnosing server problems. For example, crashes and hangs are terms often used interchangeably to describe system (application server) state. However, they are different and could require slightly different methods to diagnose. Following are some common terms and their definitions.

Crash

The ColdFusion server process stops running/disappears (either the coldfusion.exe or the weblogic/websphere server process). Java itself has crashed and there could be a hotspot log (hs*.log) to help diagnose the problem.

Hung

The ColdFusion server is unresponsive. ColdFusion (the coldfusion.exe/Windows process) still appears in task manager and doesn't appear to be taking high CPU. But, it is also not serving any new pages ("ps -ef" command [Unix/Linux]).

Spinning

The ColdFusion process is taking high CPU resources (approaching 100%). On Windows 2 CPU systems, CPU usage for coldfusion.exe takes 50%.

Thread Dump

A list of all running Java threads in a ColdFusion server. For Tomcat-based systems, "simultaneous request" threads in thread dumps are prefixed with either ajp-bio-[port]-exec-nn (External Web Server) or catalina-exec-nn (Internal Web Server). Some others are scheduler-nn, jndi-nn, skimmer-nn.

Stack Trace

An individual thread's execution stack (snapshot) that contains the call stack at the time the thread dump was taken. Or, when an exception was thrown (where the stack is found in the ColdFusion log files, such as exception.log). Some use the terms "stack trace" and "thread dump" interchangeably. However, they are not the same (even though some Sun docs use them interchangeably).

Also review ColdFusion logs (application.log, server.log, exception.log, and so on) to help diagnose problems along with using thread dumps.

Upgrade the ColdFusion server to the latest supported patches

Before attempting to debug problems, check the ColdFusion support site for the latest hot fixes and updaters for ColdFusion 10. It's best to run the latest version and patches.

Generate a full thread dump

Follow the steps below to capture a full thread dump when the system is in an unresponsive state. Each full thread dump is a snapshot of every thread in the system at that moment in time.

Windows (any configuration): When running as a Windows service (Recommended)

(Requires the process "log on" parameters with system-level privileges.)

  1. On the Log On tab of the ColdFusion 10 Application Server Windows service, select Allow Service to Interact with Desktop.

    For example: ColdFusion 10 Application Server.

    When the service starts, a DOS window labeled coldfusion.exe opens.

  2. When you are having performance problems or ColdFusion appears to be unresponsive, get a thread dump to see where the underlying problem is. To trigger the thread dump capture, set focus to the DOS window by clicking the title bar a few times. Then, press Control+[Pause|Break]. ColdFusion prints a full thread dump to the file [servername]-out.log.

    BE CAREFUL: Some users on the first try press Control+C by habit and kill the ColdFusion process. The [Pause|Break] key is typically in the upper-right area of the keyboard next to the Scroll Lock key. Also, make sure that Task Manager isn't running in front, and taking focus of the DOS window.

    Take at least two or three thread dumps, 20 seconds apart. Doing so captures what each thread is doing over time. Therefore, it's possible to diagnose whether individual threads are progressing or stuck. A stuck thread (or a long running one) is one that shows the same line number executing over 30 or more seconds in the same ajp-bio-[port]-exec- thread.

    Note: View [servername]-out.log with a browser rather than viewing it with Notepad or WordPad since it has no carriage return characters. Use the Open With option from Windows Explorer and choose Internet Explorer or Netscape/Firefox/Mozilla to view the file.

  3. Diagnose the thread dump using the steps in the Reading the thread dump portion of this article.

  4. To terminate ColdFusion 10, shut down the Windows service.

    Note: If you have coldfusion enterprise license, then you can use server monitor to take thread dump.

Windows (Server configuration): When running from a DOS window

  1. Start ColdFusion using cfstart.bat in cf_root/[CF Instance]/bin. Redirect the output of this command to a file, for example:

    C:\ColdFusion10\cfusion\bin> cfstart > out.txt

    A DOS window labeled "Command Prompt - cfstart" appears while ColdFusion runs.

  2. When you are having performance problems or ColdFusion appears to be unresponsive, get a thread dump to see where the underlying problem is. To trigger the thread dump capture, set focus to the DOS window by clicking the title bar a few times and pressing Control+[Pause|Break]. ColdFusion prints a full thread dump to the file out.txt.

    BE CAREFUL: Some users on the first try press Control+C by habit and kill the JRun process. The [Pause|Break] key is typically in the upper-right area of the keyboard next to the Scroll Lock key. Also, make sure that Task Manager isn't running in front, and taking focus of the DOS window.

    Take at least two or three thread dumps 20 seconds apart. Doing so captures what each thread is doing over time, so it's possible to diagnose whether individual threads are progressing or stuck. A stuck thread (or a long running one) is one that shows the same line number executing over 30 or more seconds in the same ajp-bio-[port]-exec- thread.

    Note: View out.txt with a browser rather than viewing it with Notepad or Wordpad since it has no carriage return characters. Use "Open With" from Windows Explorer and choose "Internet Explorer" or "Netscape/Firefox/Mozilla" to view the file.

  3. Diagnose the thread dump using the steps in the Reading the thread dump portion of this article.

  4. To terminate ColdFusion, press Control-C twice in the DOS window.

    Note: If you have coldfusion enterprise license, then you can use server monitor to take thread dump.

Unix/Linux (Server configuration)

  1. Start the ColdFusion process in the normal manner:

    cf_root/[Instance Name]/bin ./coldfusion start

  2. Obtain the ColdFusion process ID by issuing:

    ps -ef | grep cfusion (Unix and newer Linux)

    pstree -p | grep cfusion (older Linux)

    Note: For ColdFusion 10, use 'grep cfusion' for Linux and Unix.

    Using the "cfusion" process ID in BOLD using the various examples above, enter:

    kill -3 {pid}

    A "kill -3" causes ColdFusion to print a full thread dump in the  cf_root/[Instance]/logs/cfserver.log file. You can view the file with most Unix editors like "vi." Unlike Windows, Unix does not require carriage return characters, only line feed characters.

    Diagnose the thread dump using the steps in the Reading the thread dump portion of this article.

What do the various "states" of the threads mean?

Threads in a dump typically have the following states:

runnable

Means that the thread is running or ready to run. Inspect threads that have stack traces with .cfc or .cfm should.

Others without .cfc/.cfm in the stack are often in harmless wait states. You can ignore them.

waiting for monitor entry

Waiting to enter a synchronized code block (rare). Synchronize blocks should execute quickly. If the code takes long enough to end up in a thread dump or more than one, it's worth investigating.

waiting on monitor

You can ignore this thread if the top of the stack shows an Object.wait() method

in Object.wait()

Waiting for an object. Sometimes this string means idle and you can ignore it. But if the stack trace for the Object.wait() thread has a running .cfm/.cfc, investigate if this thread is blocking with another thread.

Thread names and their relevance to reading the thread dump

Most threads listed in the thread dump are related to system activities, not user code and you can ignore it while analyzing the thread dump. These activities include: jndi-nn,skimmer-nn, jms-fifo-nn, Transaction Domain*, RMI*, Finalizer, and Reference Handler.

The ajp-bio-[port]-exec-nn (simultaneous request threads) and catalina-exec-nn (Internal Web Server) are the primary ones to look at. scheduler-nn, and Thread-nn threads are the second group that are sometimes relevant if not in Object.wait(). Things such as database driver code can spawn Thread-nn threads.

Reading the thread dump

Open thread_dump.log, out.txt or[servername]-out.log in a web browser and search for the phrase "Full thread dump:."

For example: Full thread dump Java HotSpot(TM) Server VM (20.4-b02 mixed mode):

There is one instance of the phrase for each thread dump generated. To avoid confusion, work from the bottom up. Search for "Full Thread Dump" and take the LAST thread dump into a separate file so text searches don't go across thread dumps, at least initially. Later, to determine whether a thread (ajp-bio-[port]-exec-28) is "progressing," search across multiple thread dumps.

If Control+[Pause|Break] or "kill -3" (Unix/Linux) was executed since first examining the thread dump file, refresh the browser and search for the last occurrence of "Full thread dump:." The list of threads is below the "Full thread dump:" line. Each thread has a stack (aka stack trace). Unfortunately, there is no timestamp.

There are generally a large number of threads. The important ones typically are the ajp-bio-[port]-exec-nn threads, especially the ones containing function names likecoldfusion.runtime.xxxxxx ormacromedia.jdbc.xxxxx. If you are using the Internal Web Server (on port 8500), threads are named catalina-exec-nn.

ajp-bio-[port]-exec-nn threads apply to commercial web servers such as Apache, Microsoft IIS, and SunOne using the ColdFusion 10 connector.

Note: The number of available Worker Threads (ajp) is dynamic, based on recent activity. There could be more Worker Threads than the number of "Simultaneous Requests" set in the Administrator. Only "Simultaneous Requests" number of these can actively be executing ColdFusion templates. The extra threads reflect both available threads and incoming ColdFusion requests that are queued up for processing. They wait for their turn before the template executes.

Thread dumps can be large. If the ColdFusion 10 logs show the same error over and over, try to match it up with a thread inside the thread dump file. If it matches, take a snippet of one of the stack traces in the thread dump and save it for technical support. Always send the entire thread dump file for analysis to support.

Typical thread entry for a request

"ajp-bio-8012-exec-1" daemon prio=6 tid=0x30d35400 nid=0x15f8 runnable [0x319de000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
	at org.apache.coyote.ajp.AjpProcessor.output(AjpProcessor.java:299)
	at org.apache.coyote.ajp.AbstractAjpProcessor$SocketOutputBuffer.doWrite(AbstractAjpProcessor.java:1093)
	at org.apache.coyote.Response.doWrite(Response.java:533)
	at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:373)
	at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:373)
	at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:403)
	at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:391)
	at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:91)
	at coldfusion.runtime.CachedBufferedOutputStream.flushBuffer(CachedBufferedOutputStream.java:102)
	at coldfusion.runtime.CachedBufferedOutputStream.write(CachedBufferedOutputStream.java:157)
	at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
	at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
	at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)
	at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
	at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
	at java.io.PrintWriter.flush(PrintWriter.java:276)
	at coldfusion.runtime.NeoJspWriter.flush(NeoJspWriter.java:320)
	at coldfusion.runtime.NeoJspWriter.addChars(NeoJspWriter.java:360)
	at coldfusion.runtime.NeoJspWriter.print(NeoJspWriter.java:153)
	at coldfusion.runtime.NeoJspWriter.write(NeoJspWriter.java:658)
	at cftest2ecfm1335854924.runPage(C:\Inetpub\wwwroot\test\test.cfm:3)

From line 26, notice that this thread is executing the template test.cfm at line 3.

The calls from various functions to the CfmServlet and to the coldfusion.filter functions are always present when running a template. The classname cftest2ecfm1335854924 comes from the filename that test.cfm was compiled into (that is, if the option to 'Save Class Files' is enabled in the ColdFusion 10 Administrator). The .class file for test.cfm is \WEB-INF\cfclasses\cftest2ecfm1335854924.class. This name is constructed by prefixing "cf" to the filename with non-alphanumeric characters replaced by their hexadecimal equivalent. For example; the dot (.) is 2e in hexadecimal. Directory separators are replaced with two underscores "__". The trailing number is a Java-supplied hash code.

The lines highlighted in RED are the important ones. They begin at an executing .cfm or .cfc and any lines above (lines 24 and 25 in the example above). The lines below the .cfc and .cfm line - generally - describe the context of the running template. For example, if macromedia.jdbc.sqlserver or macromedia.jdbc.oracle appears below the .cfm or .cfc, there is a database interaction (cfquery, cfstoredproc, and so on).

In the example above, the lines in BLUE - or some subset of them depending on the request - usually aren't relevant to figuring out the problem so it's important not to be distracted by them.

Deadlocks

One thread holding a resource that another thread or group of threads requires can cause Deadlocks. This thread is the so-called thread wait pileup and can cause dozens of successive "timeout waiting for an available thread to run" type errors in the logs. A thread dump with a stuck system often reveals one thread causing the problem. The root cause could be user code, code in the core ColdFusion or some outside influence such as a database problem or network issue.

Example 1: Purging client variables in the Windows registry causes high memory/system hang

Symptom: CPU spins at 100% - and high memory usage - about every hour after startup (large memory use too).

Here's a common one that some fall into going from development to production. The default for storing client variables is "REGISTRY". Once the number of records in the registry is large, the query to get all records and delete expired client records can take 100% CPU for minutes at a time.

Registry client storage should never be used for production systems. But developers often fall into this by accident by not explicitly specifying a data source (and the ColdFusion admin defaulting to "REGISTRY"). Since the registry isn't a real database, ColdFusion has to retrieve the entire registry client tree (high memory usage) and compare the date/time one at a time to decide whether to purge a record. This is a CPU-intensive operation with an hourly purge that could only delete a few records.

Since this is a system task, it's running in a scheduler thread instead of an ajp thread. Here's a case where a scheduler thread is relevant.

 

Note: If using a 2 processor system on Windows, task manager would show a processor at 50% utilization which is the equivalent of 100% CPU on a single CPU system.

Example 2: Refreshing of cached query - - causes a slowdown or hang

Symptom: Requests back up and request backlog is high. Server can either seem hung or unresponsive. Memory usage is not high.

A hang long enough to record a cached query refresh in the thread dump could have associated issues.

Is the user bringing back a huge query? If the query has a large number of rows, all other pages that hit the cached query will block until the first user hitting the query (after the time limit value) gets a resultset from the database (the "Waiting for monitor entry" below).

Without the thread dump, the user might change the time limit value of the cached query to make it SOONER which only makes the problem worse.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License  Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

Legal Notices   |   Online Privacy Policy