Canonicalization describes the mechanisms for tracing back different polymorphic expressions to a canonical distinct expression. For example, within the context of a search engine, the data file 'Hello World.doc' may be accessible by any one of the following polymorphic links:
The canonical representation ensures that 'strange' but allowed forms of an expression (for example, URL encoding or Unicode) do not pass any filter mechanisms. A polymorph representation of data is not necessarily an attack in itself, but helps to slip malicious data past a filter by "disguising" it.
The figure below clarifies what you as a developer have to do:
Dependencies of the Canonicalization Process:
Example of a bad filter
Example for the original file, that is allowed to be accessed:
c:\sap\file\test.txt
Example of potentially malicious code:
Example Code 1
c:\sap\file\test.asp
Access to the file will be denied by the filter of the process, due to the extension .asp . Such a filter does not accept any .asp or .jsp extensions.
Example Code 2
c:\sap\file\test.asp::$data
The file ends with ' data ', which is not interpreted as malicious ending such as '.asp ' or '.jsp ' by the process filter. Therefore, the file will be accessed by the interpreter.
Example Code 3
c:\sap\file\test.asp%00de.doc
The file ends with ' doc ', which is not interpreted as malicious ending such as '.asp ' or '.jsp ' by the process filter. The file will be opened, because the interpreter does not accept any information following the NULL (%00).
surfnet.dl.sourceforge.net/sourceforge/owasp/OWASPGuide2.0.1.pdf