### Overview

This page contains request classification tiers and reasons as well as mitigations, with explanations for some non-trivial cases. 

### Request classification

`http_desync_guardian` is a library for analyzing and classifying HTTP/1.x requests to provide customers security balanced with necessity to serve traffic for legacy or proprietary systems (not always RFC compliant).

* `Compliant` - RFC compliant requests (*)
* `Acceptable` - non RFC compliant requests, but which do not represent security risks
* `Ambiguous` - requests that might be treated differently by different HTTP servers and therefore may lead to HTTP Desync issues (and request splitting/smuggling as a possible consequence)
* `Severe` - either malformed or highly likely crafted to trick HTTP parsers and cause HTTP de-synchronization.

### Recommended `http_desync_guardian` Modes

| Classification | Defensive mode | Strictest mode |
|----------------|----------------|----------------|
| Compliant      | Allowed        | Allowed        |
| Acceptable     | Allowed        | Blocked        |
| Ambiguous      | Allowed¹       | Blocked        |
| Severe         | Blocked        | Blocked        |

¹ Route the requests but closes the client and target connections.

For `Blocked` requests the client connection must be closed.

If you are concerned about potential impact, _Monitoring mode_ offers a metrics-only approach to assess prior to switching.

### Classification Reasons

* `Compliant`
    * `Compliant` - a compliant request
* `Acceptable`
    * `NonCompliantHeader` - non-essential header containing a non-ASCII or control characters (CTL) -  i.e. special invisible characters.
    * `SpaceInUri` - unescaped space in the URI
    * `NonCompliantVersion` - version which contains extra spaces, missing (i.e. HTTP/0.9) or matches HTTP/1.[2-9]
    * `GetHeadZeroContentLength` - GET/HEAD request with a “Content-Length: 0” header
* `Ambiguous`
    * `EmptyHeader` - if there is an empty header or a line with whitespaces only in the request
    * `AmbiguousUri` - an URI containing CTL characters
    * `UndefinedContentLengthSemantics` - Content-Length for GET/HEAD requests
    * `UndefinedTransferEncodingSemantics` - Transfer-Encoding for GET/HEAD requests
    * `DuplicateContentLength` - duplicated Content-Length header (same value)
    * `BothTeClPresent` - both Transfer-Encoding and Content-Length are present in the request
    * `SuspiciousHeader` - a header that can be normalized to `Transfer-Encoding` or `Content-Length` using common text normalization techniques (sanitation, case normalization, delimiters normalization).
* `Severe`
    * `BadHeader` - header containing null-character or CR
    * `BadUri` - URI containing null-character or CR
    * `BadVersion` - malformed version
    * `MultipleContentLength` - different Content-Length headers
    * `BadContentLength` - a non-parseable value or an invalid number
    * `MultipleTransferEncodingChunked` - multiple Transfer-Encoding: chunked headers
    * `BadTransferEncoding` - unknown Transfer-Encoding value
    * `BadMethod` - malformed method
* `Parsing` raw-requests 
    * `NonCrLfLineTermination` (Acceptable) - allowing “\n” line termination (similar to Nginx).
    * `MultilineHeader` (Ambiguous) - multi-line headers are non RFC compliant (except Content-Type)
    * `PartialHeaderLine` (Ambiguous) - if a header line was not terminated
    * `MissingLastEmptyLine` (Ambiguous) - there is no empty line at the end of request
    * `MissingHeaderColon` (Ambiguous) - header line doesn’t have colon separator
    * `MissingUri` (Ambiguous) - there is no URI in the request line

### Details on certain classifications

#### Undefined Content-Length/Transfer-Encoding Semantics (UndefinedTransferEncodingSemantics, UndefinedContentLengthSemantics)

A payload within a GET/HEAD request message has no defined semantics.
https://tools.ietf.org/html/rfc7231#section-4.3

https://medium.com/@knownsec404team/protocol-layer-attack-http-request-smuggling-cc654535b6f
3.1 GET Request with CL != 0

https://portswigger.net/web-security/request-smuggling/exploiting
See "Capturing other users’ requests"
https://www.cgisecurity.com/lib/HTTP-Request-Smuggling.pdf
see "EXAMPLE #3"

#### Both Content-Length and Transfer-Encoding are present (BothTeClPresent)

If a request containing both Content-Length and Transfer-Encoding was received, it means that the sender didn’t follow RFC, and thus there is a chance that request boundaries might be out of sync with the sender.

> If a message is received with both a Transfer-Encoding and a
Content-Length header field, the Transfer-Encoding overrides the
Content-Length. Such a message might indicate an attempt to
perform request smuggling (Section 9.5) or response splitting
(Section 9.4) and ought to be handled as an error. A sender MUST
remove the received Content-Length field prior to forwarding such
a message downstream.

https://tools.ietf.org/html/rfc7230#section-3.3.2

#### Multi-line headers (MultilineHeader)

Multi-line headers have been deprecated in RFC 7230, and different engines may either support it or not, which provides malicious actors a toolkit to trick parser to “see” headers that are not there or vice versa. That’s why we mark requests containing multi-line headers as Ambiguous (except the Content-Type header).

> Historically, HTTP header field values could be extended over multiple lines by
preceding each extra line with at least one space or horizontal tab (obs-fold).
This specification deprecates such line folding except within the message/http media
type (Section 8.3.1). A sender MUST NOT generate a message that includes line folding
(i.e., that has any field-value that contains a match to the obs-fold rule) unless the message
is intended for packaging within the message/http media type.

https://tools.ietf.org/html/rfc7230#section-3.2.4

#### Multiple Transfer-Encoding Chunked (MultipleTransferEncodingChunked)

> A sender MUST NOT apply chunked more than once to a message body

https://tools.ietf.org/html/rfc7230#section-3.3.1

#### Multiple Content-Length Headers (MultipleContentLength, DuplicateContentLength)

If there are multiple different Content-Length headers (different values) the request is marked as Severe.
 In the case of multiple but same values, it falls into DuplicateContentLength category (marked as Ambiguous).

> If a message is received that has multiple Content-Length header
fields with field-values consisting of the same decimal value, or a
single Content-Length header field with a field value containing a
list of identical decimal values (e.g., "Content-Length: 42, 42"),
indicating that duplicate Content-Length header fields have been
generated or combined by an upstream message processor, then the
recipient MUST either reject the message as invalid or replace the
duplicated field-values with a single valid Content-Length field
containing that decimal value prior to determining the message body
length or forwarding the message.

https://tools.ietf.org/html/rfc7230#section-3.3.2

#### Suspicious headers (SuspiciousHeader)

There is a range of attacks to masquerade Transfer-Encoding and Content-Length headers, so some engines in the chain will see them while others won’t. For example:

```
Transfer-Encoding : chunked
Content-Length: 100
```
In this case, some engines may reject this request, as it’s not RFC compliant. Some may sanitize the space before the colon and treat as it has “Transfer-Encoding: chunked” while some may see “Transfer-Encoding[space]” and ignore it. It is the simplest case to illustrate this idea, but there are many others. For instance:

```
Transfer-Encodıng: chunked
```
Small Dotless I becomes ASCII “I” on upper-case transformation which may trick some engines. Or have a CTL character:

```
Transfer_Encoding: chunked
\x01Transfer-Encoding: chunked
Transfer-Encoding\b: chunked
```
Some engines may normalize delimiters or non-letters, e.g., using regular expressions, etc. (especially using standard string trimming routines that may have different behaviors in different platforms).

To mitigate these risks, we determine the similarity of headers to `Transfer-Encoding` and Content-Length and mark requests as Ambiguous if any of these deviations are detected.

### Mitigations

There are two types of mitigations:

* Reject request with 400 and close the connection 
* Serve the request but disable connection re-use on both front-end and back-end.

Why connection is closed after a `Severe` request

In this case, we cannot establish request boundaries and tell when the next request starts.

Why connections are both FE/BE connections closed after an Ambiguous request?

Let’s start with not re-using a backend connection example:

![](./close-be-connections.png)
1. Attacker sends a request, such as Proxy only sees POST /foo while backend also sees GET /poison
2. However Proxy marks the request as Ambiguous
3. Proxy closes the connection after the response
4. The /poison response is dropped as the connection is not going to be re-used.

This seems to be efficient, but falls short if there is a layer in front of the proxy:
![](./close-be-connections-layer-in-front.png)
1. In this case let’s assume Desync happens between CDN and the Proxy and Proxy marks the request as Ambiguous
2. While Proxy closes the BE connection, it’s not helpful
3. The /poison response is still served via re-used front-end connection

But if both FE/BE connections are closed, then HTTP Desync is prevented:
![](./close-fe-be-connections.png)
1. Same as in the previous example, let’s assume Desync happens between CDN and the Proxy and Proxy marks the request as Ambiguous
2. Now Proxy closes both FE/BE connections.
3. The /poison response is dropped.