Canonicalization

Description

Canonicalization describes the mechanisms for tracing back different polymorphic expressions to a canonical distinct expression. For example, within the context of a search engine, the data file 'Hello World.doc' may be accessible by any one of the following polymorphic links:

www.sap.com/Hello+World.doc
www.sap.com/hello+world.doc
www.sap.com/Hello%20World.doc

The canonical representation ensures that 'strange' but allowed forms of an expression (for example, URL encoding or Unicode) do not pass any filter mechanisms. A polymorph representation of data is not necessarily an attack in itself, but helps to slip malicious data past a filter by "disguising" it.

What Do I Need to Do?

The figure below clarifies what you as a developer have to do:

Unescape the input and bring it to its shortest or simplest form (canonicalization).
Validate the input depending on the output (HTML, database or file system).
- Be aware of double-encoded characters.
- Check if you are working in the same character space (Unicode or ASCII).
- Remember that combinations of ASCII and HEX characters may represent malicious code. See also SQL Injection .
- Remember case sensitivity and try to find a 'capitalized' canonical form.
Check against a white list of patterns instead of using a black list.
Take into account to the interpreters' operation mode, because different interpreters might handle the same data in different ways.

Dependencies of the Canonicalization Process:

Examples

Example of a bad filter

Example for the original file, that is allowed to be accessed:

c:\sap\file\test.txt

Example of potentially malicious code:

Example Code 1

c:\sap\file\test.asp

Access to the file will be denied by the filter of the process, due to the extension .asp . Such a filter does not accept any .asp or .jsp extensions.

Example Code 2

c:\sap\file\test.asp::$data

The file ends with ' data ', which is not interpreted as malicious ending such as '.asp ' or '.jsp ' by the process filter. Therefore, the file will be accessed by the interpreter.

Example Code 3

c:\sap\file\test.asp%00de.doc

The file ends with ' doc ', which is not interpreted as malicious ending such as '.asp ' or '.jsp ' by the process filter. The file will be opened, because the interpreter does not accept any information following the NULL (%00).

Further Information

OWASP Guide Version 2.0.1 (Pages 185 -191)
Note
surfnet.dl.sourceforge.net/sourceforge/owasp/OWASPGuide2.0.1.pdf