This article describes ColdFusion Performance debugging techniques on both the Windows NT and Solaris platforms.

Some of the most difficult ColdFusion problems to diagnose are performance and stability issues. This is due to the fact that there are many factors that affect both performance and stability of applications. This document is broken into two sections:

I. ColdFusion Thread Processing.

II. ColdFusion Performance and Debugging Techniques.

ColdFusion Thread handling

In order to diagnose performance issues, it is useful to understand how ColdFusion processes .cfm requests.

Unix versions of ColdFusion 4.X and NT ColdFusion 3.1.1 (and earlier) processed threads in this way:

  1. A .cfm page is requested by a web browser.
  2. Due to an installed advanced Web Server mapping, setting the .cfm extension to one of the supported Web server API "stubs", the .cfm request is sent to the appropriate web server stub (ISCF.dll, NS35CF.dll etc).
  3. This stub then sends the request to the ColdFusion server for processing.
  4. A thread is created for this new .CFM request and if one of the "simultaneous requests" slots is free, the thread is processed by the ColdFusion engine.



    The simultaneous request pool is the pool of actively running requests inside the ColdFusion engine. This pool can be limited by adjusting the number to the "Limit Simultaneous Requests to" setting in the ColdFusion administrator.
  5. The ColdFusion server processes all of the CFML tags and converts the results to HTML and sends them back to the web server to be sent to the browser.
  6. At this point the thread used to process the .cfm request is destroyed.

The important point to note here is that if the ColdFusion engine should come across any third party resource request, i.e.cfquery, cfobject, cfmail,cfftp, cfpop, cfhttp, Applet, CFX request etc), the ColdFusion engine will send the request off to this resource and wait for the response to come back from the resource.

If the third party response never returns (meaning a result set or an error message does not return from the resource), the thread will continue to wait and will NOT timeout. If you have many .cfm pages with many waiting third party requests the ColdFusion simultaneous request limit will be reached and at that point the ColdFusion server will appear to hang. ColdFusion is not failing to process however, it simply is waiting for a response from a third party resource call.

One of the NT ColdFusion 4.0 features is improved thread handling. In ColdFusion 4.0 threads are processed in this way:

  1. A .cfm page is requested by a web browser.
  2. Due to an installed advanced Web Server mapping, setting the .cfm extension to one of the supported web server API "stubs", the .cfm request is sent to the appropriate stub (ISCF.dll, NS35CF.dll etc).
  3. The stub then sends the request to the ColdFusion server for processing.
  4. A listener thread receives the request and sends it to one of the waiting active simultaneous thread that will process the request. If all of the simultaneous threads are busy processing then the listener thread places the request on a waiting list. When one of the simultaneous request slots becomes available the listener thread sends the waiting request to the active slot.
  5. The ColdFusion server processes all of the CFML tags and converts the results to HTML and sends them back to the web server to be sent to the browser.
  6. The thread is not destroyed under ColdFusion 4.0 but put back into the active pool to be used again. This thread recycling is much less resource intensive than the thread create/destroy scenario with ColdFusion 3.1.

Though ColdFusion 4.0 offers this improved thread handlingits processing of third party requests remains the same: If the ColdFusion engine should come across any third party resource request, i.e. cfquery, cfobject,cfmail, cfftp, cfpop,cfhttp, Applet, CFX request etc), the ColdFusion engine will send the request off to this resource and wait for the response to come back from the resource.

If the response never returns (meaning a result set or an error message does not return from the resource), the thread will continue to wait and will NOT timeout. If you have many .cfm pages with many waiting third party requests the ColdFusion simultaneous request limit will be reached and at that point the ColdFusion server will appear to hang. ColdFusion is not failing to process however, it simply is waiting for a response from a third party resource call.

ColdFusion Performance and Stability Debugging (NT)

When encountering a performance or stability problem, more often than not the problem lies with the processing of a third party request rather than ColdFusion processing. However pinpointing these third party problems can be difficult. Included here are steps that should be taken before contacting Macromedia Technical Support.

  1. Verify the source of the unresponsiveness of the ColdFusion server. Try and open up the ColdFusion administrator or another simple page that has a ".cfm" extension. If this fails, try to open a .htm or .html page. If the web server returns the .htm page then the web server is functioning properly, go to step 2. If the htm page does not return, your web server is causing the problem, stop and restart the web server. If after stopping and starting the web server ColdFusion pages still do not return, go to step 2.
  2. If you have ColdFusion 4.0 installed, open up the NT Performance Monitor and add the ColdFusion "Queued Requests", and "Running Requests" counters, along with %CPU usage for the CFSERVER instance. If a third party request is suspected in hanging the ColdFusion server, then at the time of the hang, %CPU usage should be at zero, Running Requests should be at the Simultaneous Request limit and Queued Requests should be rising. This indicates that all ColdFusion active threads are "waiting" for a third party resource (probably a database) call to return. This also means that ColdFusion has not stopped responding, as it is aware of the Queued requests piling up. You should run performance monitor while a performance/stability problem occurs to gather valid data on the issue.



    If ColdFusion 3.1 or earlier is installed the Performance monitor counters needed to analyze third party resource request latency will be different. The counters necessary to view are "Thread Count" for the cfserver instance, and "%CPU usage" for the cfserver instance.
  3. At the time of the unresponsive period, try running a query through MSQuery, using SQLPlus or ISQL does not connect to the database through ODBC. This will just verify the ODBC connection to the database. Preferably you would want to run the same SQL statement that you believe is causing the problem. This can be difficult to determine however with MS SQL Server 6.5 you can determine what SQL Statement is running by going to the "Current Activity" section of the Enterprise Manager. There you can double click a current user, a dialog box will appear with the SQL Statement that is running. Again if locked tables are the suspected problem this will allow you to determine this.
  4. If locked tables are the suspected problem, then the MSQuery test should also work fine. This will probably be due to the fact that the query they are running in MSQuery is not using the locked tables. This is why it is important to try and access the same tables as are currently being locked.
  5. If you are using Oracle this DBMS' has performance monitor hooks that allow you to view how many locked tables there are in the database. At the time of the hang open up perfmon and view these counters. This will show how many locks are current in the database. It will not tell you which tables are locked.
  6. Turn on "logging of slow pages" in the debugger section. At the very least it will list the templates that are active for x number of seconds. This will give your users a place to start looking for long running queries.
  7. Turn on the "Restart at x unresponsive requests" setting. This setting will track requests that execute third party calls but fail to return in a timely manner. The ColdFusion application server service is restarted when the specified number of request threads do not respond in the alotted time. Requests are tracked only if a request timeout is enabled. The unresponsive threads are logged in the server.log file, with the fully qualified path to the template running included. By analyzing this log over time you can identify which files run consistently longer than other templates. This log is a good place to start your CFML coding analysis.
  8. If you are using the Microsoft Oracle ODBC driver, go into the ODBC administrator and view the advanced settings for the data source in question. There is a "BufferSize" setting on default it is set to 43560, it should be set to zero. This is similar to the "MaxBufferSize" settings in Access. The Oracle ODBC driver does not expose this setting.
  9. Non-functioning, or slow functioning CFX tags, or CFML code called in your ColdFusion templates can also be the cause of ColdFusion performance/stability issues. Unfortunately determining that a CFX tag or piece of CFML code is causing the problem is not an easy task. In the cases of Database locks there are vendor specific utilities that allow insight into table activity. There are no such utilities to debug CFX tags or CFML code. What can be done is code a CF function called GetTickCount around your CFX tag call (or any suspected piece of slow performing code). With GetTickcount we can determine how long a certain snippet of code takes to process. If this page returns slowly under no load then under a load situation this performance problem will magnify. An example of this is listed below:

    <CFSET tickBegin = GetTickCount()><CFSWITCH EXPRESSION=#a#><CFCASE VALUE="1"><CFOUTPUT>#a#</CFOUTPUT></CFCASE><CFCASE VALUE="2"><CFOUTPUT>#a#</CFOUTPUT></CFCASE><CFCASE VALUE="3"><CFOUTPUT>#a#</CFOUTPUT></CFCASE><CFDEFAULTCASE> A is not 1, 2, or 3</CFDEFAULTCASE></CFSWITCH><CFSET tickEnd = GetTickCount()><CFSET loopTime = tickEnd - tickBegin><CFOUTPUT>Time to Complete: #looptime#</CFOUTPUT>

Again, this technique can be used to analyze performance of any snippet of CFML code.

General Performance Points

  • Tune the ColdFusion server according to your application's functionality:
    1. The one ColdFusion administrator setting that will have the biggest impact on your systems performance is the "limit simultaneous requests" setting. It is the very first setting in the ColdFusion administrator. This setting controls how many simultaneous .cfm requests can be handled at any one time. There is no rule that is set in concrete that say what this number should be set to, it is based on your application's functionality. As a general guideline, the more CPU intensive your application the lower you will want to set this number. (3-5 X the number of processors). With a CPU intensive application allowing many simultaneous requests will slow down ColdFusion. Conversely if your application is more database intensive in nature you will want to raise this number (4-10 X the number of processors). In a database intensive application ColdFusion may do a lot more "waiting" for database result sets to return, with the scenario ColdFusion can effectively process more simultaneous requests as the %CPU usage for ColdFusion will not be as high.
    2. Turn on Strict attribute validation in the ColdFusion administrator. The parser has been significantly tightened up in 4.0 and will catch syntax that unfortunately made it through and into the engine in 3.x.
    3. Another very common cause of instability is caused by lack of named CFLOCK's around ALL read and write access to Session and Application variables. Under load, or when a frameset is used (the browsers retrieve pages simultaneously in 4 to 8 threads, depending on the browser), it is possible for two or more templates to write to the same user's session variables and we made a decision for performance reasons not to globally restrict all access to these variables to single threads. Developers should be aware that they must place named CFLOCKs around all read and write access to these variables, and the careful placement and naming of the lock can minimize the impact on performance.



      A good choice for the naming locks around writes to Session variables (since names restrict code block access globally across the server) is to dynamically name the lock by concatenating the CFID and CFTOKEN values, thereby single-threading access to those variables only for the one user in the one browser.
    4. If the content in your application is very static turn on the "Trusted Cache" setting in the ColdFusion administrator. When checked any requested files found to reside in the template cache will not be inspected for potential updates. This minimizes system overhead.
    5. Verify you have a template cache large enough to house all of the cached pages in your site. This number is set in the ColdFusion administrator. This number should be set to (2 X the sum total of the pages in your site plus any custom tags you may be using). In the integrated performance monitor counters there is a counter called Cache Pop Hits/Second. Monitor this counter over a span of time, If this number is very high it indicates that that thee template cache has reached it's limit and a cached template had to be removed from the template cache to make room for another cached template.
  • Network latency could also be a problem when processing ColdFusion templates. In an ideal world all servers (both web servers and database servers) would exist on the same subnet. In the real world this is not the case in many instances. Servers are located throughout different subnets, which will impact how quickly ColdFusion receives and returns requests. At a minimum, verify that the network is performing at optimum efficiency. Get a network administrator involved to perform extensive sniffing and analysis.
  • Database performance in general is a very big factor in how fast or slowly your ColdFusion server will perform. A database that has been tuned for performance, will return result sets back to ColdFusion in a timely fashion. This minimizes ColdFusion threads waiting for result sets, speeding request response time and freeing threads for other requests. Database tuning is a vast topic one that cannot be covered adequately in this forum. At a minimum a DBA should be involved to analyze SQL statements for efficiency, analyze tables for proper indexing, and analyze database structure in general.
  • Backup strategy and BackOffice routines will play a crucial role in the performance of your ColdFusion site. Keep in mind that any routine run on either a production box housing the ColdFusion sever or a database server that ColdFusion is connected to will slow down processing of ColdFusion. If at all possible do not schedule database backup during known heavy load periods as the database processing will divided from serving ColdFusion requests and performing the backup. Do not allow intricate reporting to be done on live production databases. At the very least do not allow them to run during heavy load periods. As more and more of the BackOffice routine are performed ColdFusion processing suffers as the threads are forced to "wait" for results sets to return from the database.
  • Limit the software running on the ColdFusion server. As a rule, you do not want to run your database server on the same machine as ColdFusion. Contention of resources occurs when this scenario is in place. Allow ColdFusion and the respective DBMS to reside on their own separate servers, taking full advantage of their respective processor's power. Following this point, you will want to run the mail server on a machine separate from ColdFusion. Disable all NT services absolutely not necessary for ColdFusion and web server functionality.

ColdFusion Performance and Stability Debugging (Solaris)

  • You need 105181-09 or later. The 4.0 release notes say -06, but we found Solaris problems that affect us (that are fixed by Sun) shortly after our release.
  • For Oracle 7 shops, be absolutely, positively sure you have 7.3.4 installed properly and you have not moved it since running the install over 7.3.3. Apparently there is a manual recompile or linking step that must be done manually. Check the Oracle docs very carefully.
  • For Oracle 8 shops, be sure you have the 8.04 or 8.05 client libraries and are using ColdFusion Application Server 4.0.1, which contains a new Oracle driver.
  • For Sybase shops, check to ensure you have 11.1.1, and check the Sybase support site for EBF's. Example: We have been at a customer site where they were crashing under load with Sybase 11.1.0 client libs and we pulled down 11.1.1 and experienced a different set of problems. We went back and checked for EBF's, pulled down the latest rollup and experienced yet another different set of problems under load. We went back and checked the site and they had *pulled* the EBF rollup we had downloaded the day before, citing serious problems and had a new set of EBF's out instead. Moral: check the Sybase support site regularly: there might be an 11.1.1 EBF with your name on it.



    For all shops:

  • Another very common cause of instability is caused by lack of named CFLOCK's around ALL read and write access to Session and Application variables. Under load, or when a frameset is used (the browsers retrieve pages simultaneously in 4 to 8 threads, depending on the browser), it is possible for two or more templates to write to the same user's session variables and we made a decision for performance reasons not to globally restrict all access to these variables to single threads. Developers should be aware that they must place named CFLOCKs around all read and write access to these variables, and the careful placement and naming of the lock can minimize the impact on performance.



    A good choice for the naming locks around writes to Session variables (since names restrict code block access globally across the server) is to dynamically name the lock by concatenating the CFID and CFTOKEN values, thereby single-threading access to those variables only for the one user in the one browser.



    For CF 3.x owners:



    Try to identify and eliminate:

  • Instances where session variables could be written to by a single user simultaneously (again, frameset source templates are one of the usual suspects), and
  • Updates to Application variables a) from more than one spot in the site and b) regular updating (i.e. only update once, in the application.cfm if they have not been set, for example) if you can help it.
  • Another potential cause of instability is the occurrence of heavy application error messages under heavy load. If your application.log or server.log regularly gets very large, very regularly, due to improper server mappings, syntax errors, etc, you should have your developers clean up the code/site ASAP.
  • Also make sure you're running in 4.0 with "strict" syntax validation turned "on" in the administrator. The parser has been significantly tightened up in 4.0 and will catch syntax that unfortunately made it through and into the engine in 3.x.
  • Another common problem relates to the Solaris default settings for file descriptors, etc. Sun recommends the following for machines used as web/application servers:



    Add the following command to the /etc/rc2.d/S69inet file on all Solaris 2.51 and 2.6 machines (see http://www.sun.com/sun-on-net/performance/tcp.slowstart.html for full details on this setting):



    ndd -set /dev/tcp tcp_slow_start_initial 2



    and appending the following to the /etc/system file can help avoid default system resource limitations for web sites:



    set tcp:tcp_conn_hash_size=1048576

    set sq_max_size=1024

    set rlim_fd_max=4096

    set rlim_fd_cur=1024
  • Be sure to check ulimit for the user account ColdFusion and Netscape/Apache (and Oracle listener, if applicable) are running as to be sure these values are not limited as well. A lack of file descriptors has been known to cause some very weird happenings since socket connections are considered file descriptors in this context (connect to web server, connect to ColdFusion, connection to database, etc).
  • Finally, be extremely aware of any other (non-DB) calls you're making outside the CF engine: CFX's, LDAP, etc.

If you've done all the above (correct Solaris patch levels, rock solid, thread-safe database client libs, religiously put named CFLOCKs around all Session and Application vars, you're running with squeaky clean logs with "strict" syntax checking turned "on", etc, etc) and you still have problems, there are a couple things you can do to help diagnose the problem if you have a core file in your coldfusion/bin directory.

  1. Using dbx, the debugging utility that reads core files, you can find out where in code the core occurred and the functions that were called leading up to the core.



    To get dbx if you don't have it installed, go to www.sun.com and download the Sun Visual C++ Workshop eval and install it, following all instructions. You'll end up with dbx (typically, if you accept the default dir) in /opt/SUNWspro/bin.
  2. Next, cd into /opt/coldfusion/bin and at the command-prompt type:



    /opt/SUNWspro/bin/dbx cfserver core
  3. Hit the space bar when it tells you, then when the prompt says (...dbx): type the following (without the quotes):



    "where > trace.txt" and then type "exit"
  4. Open the trace.txt file and you'll see something similar to the following (this trace is from an actual customer who had moved the Oracle libraries after upgrading from 7.3.3 to 7.3.4 without performing the manual linking step required. In his case, SQLPlus worked great, but he core'd periodically under load). These things are little ugly, but just understand that it's a list of functions called just before the core dump occurred, with the actual function that caused the core dump listed at the top, with the little arrow "=>" next to it. The rest of the functions listed are in backward chronological order (trace) of functions called before it. Note the ColdFusion function at line [13] "CFORAConnect::OpenConnection", where we begin the call to the Oracle libs to make a DB connection. Deducing what should happen from there (that ColdFusion would probably call the Oracle libraries to make a connection), combined with a bit of careful examination of the function names (note the "oci" in the function call name...Oracle's nomenclature), coupled with the fact that we see no more function names starting with "CCF" before the crash, and you can pretty much deduce that something bad happened in the Oracle libraries. You would then go over (in this case) the Oracle SQL*Net install with a fine tooth comb.



    current thread: t@5529 =>[1] snsbittrm_ts(0x7967f10, 0xa55df10, 0xebe0b398, 0xebde49a4, 0xf6d6b0, 0xf6d668), at 0xebc777f4

    [2] nsgbltrm(0x7967f10, 0x0, 0x2, 0xebdef740, 0xebe01a70, 0xe6308b0c), at 0xebcb6c10

    [3] nngsdei_deinit_streams(0x8bff030, 0xebe01a70, 0x8012d78, 0x0, 0x818f2e8, 0xebded098), at 0xebd0a374

    [4] nncidei(0x8bff030, 0x0, 0x0, 0x5d95c00, 0xf6d668, 0xf6d6b0), at 0xebd06580

    [5] nnfgdei(0xebe15db8, 0x7fccbd8, 0x8142180, 0xe6309f60, 0xe6308edc, 0x1000), at 0xebced6ac

    [6] osnqrn(0xebe15db8, 0x1803, 0xf6d6b0, 0x4, 0xf6d668, 0x0), at 0xebcdbd8c

    [7] osncon(0xe630b0d0, 0x1803, 0x0, 0x0, 0x1, 0x3ec12c8), at 0xebc797b4

    [8] upiini(0x3ebfa88, 0x4, 0x3ec03cc, 0x781e64c, 0x80a83f8, 0xebe01a70), at 0xebc61d6c

    [9] upiah0(0x80a83f8, 0x781e64c, 0x4, 0x0, 0x81010100, 0xff0000), at 0xebc59460

    [10] upilgn(0x80a83f8, 0x80aa0cc, 0xffffffff, 0x8c2d0b4, 0xffffffff, 0x0), at 0xebc5e648

    [11] upilog(0x80a83f8, 0x80aa0cc, 0xffffffff, 0x8c2d0b4, 0xffffffff, 0x781e64c), at 0xebc5fbe0

    [12] ocilog(0x5a6ab60, 0x80a83f8, 0x80aa0cc, 0xffffffff, 0x8c2d0b4, 0xffffffff), at 0xebc50e60

    [13] CCFORAConnect::OpenConnection(0x7b0c4b8, 0x781e64c, 0x80aa0cc, 0x8c2d0b4, 0x7b0c4c4, 0x0), at 0xec887618

    [14] OpenConnect(0x80aa0cc, 0x88791a0, 0x7b0c4b8, 0x781e64c, 0x8a942a0, 0x0), at 0xec886e80

    [15] CCFDataAccessConnection::Open(0x88791a0, 0x88791a0, 0xf6c8d8, 0x88791a0, 0xedc8e7cc, 0x0), at 0x225d44

    [16] CCFDataAccessMgr::Connect(0x7a14d8, 0xe630c92c, 0x0, 0x8bfee88, 0xf6df1c, 0x88791a0), at 0x2223d8

    ...



    By studying trace function names closely, they can often lead you to understand *where* you're having stability problems, if not why.



    If possible, you can use this technique with multiple core files to try to figure out what's happening, especially if instability is ongoing. If you get 2 or more identical traces, where the last 10 lines or so list identical functions, congratulations, you've at least got consistency and possibly a reproducible case, which can be the break needed to help solve your problem (or at least narrow down the possibilities).
  5. If you're not getting core files created when it's reported that "something's wrong with the site", and ColdFusion appears to be unresponsive (but no core file was created), there's a good chance you've got hung request threads (a common cause of which is database locking or contention or DB client lib problems).



    To check for hung threads, make sure you have the "Enable Performance Monitoring" and "Log Requests Taking Longer Than X Seconds" turned on in the ColdFusion Administrator, and in the /coldfusion/bin directory, run:



    ./cfstat 2



    If you see, for example, five running requests and you have the "Number of Simultaneous Requests" in the ColdFusion Administrator set to five, and that number doesn't drop, ColdFusion probably made a call outside its environment (to whatever, typically to the database) and the call never came back for some reason. Check the log files (especially application.log and server.log) to look for clues or recent errors that might be accomplices to the drama.
  6. Of course, it goes without saying that machine resources, hardware problems or contention for resources can contribute to robbing you of a well deserved good night's rest, so I won't bother saying it. Ok, I will anyway: once in a very, very long while we come across a site that is running a production site on a Ultra 2 running ColdFusion, Netscape, Oracle server, firewall, SMTP server, POP server, NNTP server and LDAP server with 128MB RAM getting hit hard and they ask why bad things happen, since it is, after all, a Sun box and should be able to take it. The important thing to realize is that a hard working server of any kind needs room to work (enough RAM, disk, etc), without having to compete for resources with other hard working servers. This puts much undue stress on everybody, and with enough stress, tiny flaws, like those in otherwise obscure OS libraries and server code, are that much more likely to make themselves known.



    So: run "vmstat 2", especially during peak periods to check CPU utilization and memory usage, and regularly check syslogs. Your machine may be trying to tell you something, but you'll never know what it is if you're not doing regular monitoring.

Summary

Debugging ColdFusion Performance and Stabilty issues can be difficult but by following the above mentioned steps can isolate the problem. The document should be considered a "work in progress" as more debugging techniques are made available the information will be disseminated here.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License  Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

Legal Notices   |   Online Privacy Policy