Cross-site Scripting (XSS) is the most prevalent web application security flaw. This occurs when the user-supplied-data is sent to the browser without being properly validated. Canonicalization is the process of reducing a possibly encoded string down to its simplest form. Before validating any data, you must canonicalize the data. The canonicalize method can decode HTML entities, URL (Percent) encodings, and JavaScript encodings. In addition to simple decoding, canonicalize can also handle input, which is encoded using different techniques:

By encoding the input multiple times or nesting the encoding using an encoding scheme.


Encoding Description
 < ->  &lt; -> &amp;lt&#x3b    Encoded multiple times using the HTML Entity Encoding
< ->  %3C -> %253C -> %25253C Encoded multiple times using percent Encoding
\ ->  %5C -> %%33%63 Nested Encoding using URL (Percent) encoding  multiple times

When the input is encoded using different encoding schemes (For instance, encoded using both HTML and URL encoding).


Encoding Description
< -> &lt; -> &%6ct;    First encoded using HTML entity encoding and then encoded using the percent Encoding
 < -> %3C -> %&x33;c     First encoded using URL (percent) encoding next nested encoded 3 using the HTML Entity encoding. 

The data that is encoded more than once (nested or mixed) is something that normal users will not generate. Hence, having this kind of input data should be considered as malicious. 


canonicalize(input, restrictMultiple, restrictMixed [, throwOnError])


ColdFusion (2018 release): Introduced named parameters.

ColdFusion 11: Added the new parameter throwOnError.

ColdFusion 10: Added this function.

Required. The string to be encode.


Required. If set to true, multiple encoding is restricted.

This argument can be set to true to restrict the input if multiple or nested encoding is detected. If this argument is set to true, and the given input is multiple or nested encoded using one encoding scheme an error will be thrown.


Required. If set to true, mixed encoding is restricted.

This argument can be set to true to restrict the input if mixed encoding is detected. If this argument is set to true, and the given input is encoded using mixed encoding, an error will be thrown.

throwOnError Optional. Default value is false. If the value of this argument is true, and if restrictMultiple or restrictMixed is true and the given input contains mixed or multiple encoded strings, an exception will be thrown. If the value of this argument is false, an empty string will be returned instead of an exception.


<!--- canonicalize the simple html entity encoded string --->
<!--- canonicalize the simple html entity encoded string ---> <cfoutput>#canonicalize("&lt;",false,false)#</cfoutput><br/> <!--- enforce multiple and mixed encoding detection. Mixed encoding is detected as the data is encoded using URL and HTML entity encoding. Multiple Encoding is also detected ---> <cftry> <cfoutput>#canonicalize("%26lt; %26lt; %2526lt%253B %2526lt%253B %2526lt%253B",true,true, true)#</cfoutput><br/> <cfcatch type="any" > <!--- throws Error when throwOnError set to true when mixed or mutiple encoding is detected. ---> <cfdump var="#cfcatch#" > </cfcatch> </cftry> <!--- enforce multiple and mixed encoding detection. Mixed encoding is detected as the data is encoded using URL and HTML entity encoding. Multiple Encoding is also detected ---> <!--- an Empty string will be returned if the throwOnError is set to false and multiple or mixed encoding is found ---> <cfoutput>#canonicalize("%26lt; %26lt; %2526lt%253B %2526lt%253B %2526lt%253B",true,true, false)#</cfoutput><br/> <!--- enforce mixed but not multiple encoding detection returns an Empty String---> <cfoutput>#canonicalize("%25 %2526 %26##X3c;script&##x3e; &##37;3Cscript%25252525253e",false,true)#</cfoutput><br/> <cftry> <cfoutput>#canonicalize("%26lt; %26lt; %2526lt%253B %2526lt%253B %2526lt%253B",false,true, true)#</cfoutput><br/> <cfcatch type="any" > <!--- throws Error when throwOnError set to true. ---> <cfdump var="#cfcatch#" > </cfcatch> </cftry> <!--- Mixed encoding is detected as the data is encoded using URL and HTML entity encoding. Multiple Encoding is also detected ---> <!--- Decodes the string using both percent and HTML Entity encodings as the flags were set to false ---> <cfoutput>#canonicalize("%26lt; %26lt; %2526lt%253B %2526lt%253B %2526lt%253B",false,false)#</cfoutput><br/> <cfoutput>#canonicalize("&##X25;3c",false,false)#</cfoutput><br/> <cfoutput>#canonicalize("&##x25;3c",false,false)#</cfoutput><br/> <!--- Simple Javascript decoding ---> <!--- see section 2.7.5 for JS Encoding ---> <cfoutput>#canonicalize("\\U003C",false,false)#</cfoutput><br/> <cfoutput>#canonicalize("\\X3C",false,false)#</cfoutput><br/>
