# Security postmortem for CVE-2019-15225, CVE-2019-15226 ## Incident date(s) 2019-07-25 - 2019-10-10 ## Authors @asraa ## Status Final ## Summary After an Envoy user publicly reported a crash in Envoy about regular expression matching in route resolution (https://github.com/envoyproxy/envoy/issues/7728), the Envoy security team found that issue could be leveraged for a DoS attack and would go through the public security release process. The fix landed in master with a public PR (https://github.com/envoyproxy/envoy/pull/7878) and was targeted to be included in a 1.11.2 security release. CVE-2019-15226 was detected via fuzzers just after the 1.11.1 security release. With the fix of CVE-2019-15225 in progress, the Envoy security team decided to lump the two fixes into a 1.11.2 security release. This was the first time in which the Envoy security release included a publicly disclosed vulnerability with a fix that was merged into master. The security release included a backported patch of the fix as well as the patches for CVE-2019-15226. ## CVE issue(s) * https://github.com/envoyproxy/envoy/issues/8519 * https://github.com/envoyproxy/envoy/issues/8520 ## Root Causes CVE-2019-15225 was caused by the use of a recursive algorithm for matching regular expressions. Envoy’s HTTP router can be configured with regular expressions for routing incoming HTTP requests that matched header values. Envoy used the libstdc++ `std::regex` implementation for these regular expressions. As a result, an HTTP request with sufficiently large header values may consume large amounts of stack memory and cause abnormal process termination. Regular expressions with the `*` or `+` quantifiers are particularly vulnerable and may cause abnormal process termination. This appeared when matching header values of 16Kb or more. CVE-2019-15226 resulted from excessive iteration of the `HeaderMap` from a time-consuming header size validation that occurred for each header added. Both codec libraries http_parser and nghttp2 have internal limits for the maximum request header size. Envoy’s HTTP/2 codec originally checked against a hard-coded max header size of 63K, which was just under the default max headers length in nghttp2. The check occurred every time a header was added, resulting in O(n^2) performance. Work on making this limit configurable (https://github.com/envoyproxy/envoy/issues/5626) also introduced the issue in Envoy’s HTTP/1 codec, where the check was added per header field mimicking the same problematic pattern as the original HTTP/2 codec. ## Resolution To resolve the memory consumption caused by excessive memory consumption from regex matching, Envoy 1.11.2 deprecates the use of `std::regex` in user facing paths. A new safe regex matcher introduces an explicitly configurable regex engine. Currently, the regex engine is limited to Google’s RE2 regex engine that implements a safe subset of the std::regex language features. The existing regex engine is in a deprecation period to allow users to switch to safe regex engines. Google’s RE2 regex engine is designed to complete execution in linear time (https://github.com/google/re2/wiki/WhyRE2) and limit the amount of memory used. Envoy 1.11.2 also includes an option to configure a “program size” when using Google RE2, a rough estimate of how complex a compiled regex is to evaluate. A regex that has a program size greater than this value will fail to compile. CVE-2019-15226 was first noticed via fuzzers when a timeout was reported by `h1_capture_direct_fuzz_test`: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=16325 on 08-09-2019. Once a reproducer was made in an Envoy deployment to confirm the issue, and some profiling work was done by the Envoy security team, we moved to a private fix process targeting the 1.11.2 release along with CVE-2019-15225. Other calls to `byteSize()` and iterations over `HeaderMap` were used were also analyzed for potential DoS vulnerabilities and performance issues. The fix re-implemented the `HeaderMapImpl::byteSize()` method to have O(1) performance by returning a `cached_byte_size_` member to `HeaderMapImpl` that was updated as header entries are added, rather than iterate over the `HeaderMap` to calculate the byte size. To resolve excessive iterations over the `HeaderMap` that can appear in access logging with many header formatters and many headers, the fix also included configurable limits for the maximum number of headers. The following patches were produced: * https://github.com/envoyproxy/envoy/commit/afc39bea36fd436e54262f150c009e8d72db5014 * https://github.com/envoyproxy/envoy/commit/5c122a35ebd7d3f7678b0f1c9846c1e282bba079 A 1.11.2 security release was announced on 09-18-2019. An e-mail was sent to the Envoy private distributor list sharing the details of CVE-2019-15226. A week later, the candidate fix patches for CVE-2019-15226 were shared with distributors on 2019-09-24. This provided two weeks for distributors to test and prepare their software for the security release date, as per the guidelines set in place after security release 1.9.1. ## Detection CVE-2019-15225 was reported by Seikun Kambashi in a public GitHub Issue describing a crash caused by a request with a very large URI for routes configured with a regex matcher: https://github.com/envoyproxy/envoy/issues/7728. Envoy’s `route_fuzz_test`, which fuzzes route resolution and header finalization, ideally should have caught this crash. The test takes a `RouteConfiguration` and a set of headers as inputs, and routes a request with the input headers with the `RouteConfiguration` given. It should have been fairly easy for the fuzzers to produce a wildcard matcher and a long header string. However, the fuzz test itself had a logical error that resulted in ignoring input path headers and setting them to a default value of “/”. As a result, the fuzz test would never have tested a large URI and an OOM or crash would never have been detected. The fuzz test was fixed in https://github.com/envoyproxy/envoy/pull/8653, and a reproducer for the CVE was added. The underlying issue behind CVE-2019-15226 was first noticed via fuzzers when a timeout was reported by `h1_capture_direct_fuzz_test`: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=16325. Some profiling work revealed that `HeaderMapImpl::byteSize()`, which is O(n) in the number of headers, is called for every single header in both HTTP/1.1 and HTTP/2 codecs. Although Envoy’s stateless HTTP/2 header fuzzers (`request_header_fuzz_test` and `response_header_fuzz_test`) perform 10x more executions per second than this fuzzer, these tested one header frame per testcase and used nghttp2’s default max header frame size (16 KB). Because of this, the frame size was too small to amplify the effect of the O(n^2) process enough to produce a timeout. ## Action Items * https://github.com/envoyproxy/envoy/issues/8567 * https://github.com/envoyproxy/envoy/issues/8875 * https://github.com/envoyproxy/envoy/issues/8898 * https://github.com/envoyproxy/envoy/issues/8901 ## Lessons Learned ### What went well * CVE-2019-15226 was detected quickly after the fuzzer reported the timeout. * The fixes for CVE-2019-15226 were straightforward and localized. * The security release occurred on time and followed the guidelines established in https://github.com/envoyproxy/envoy/blob/master/SECURITY.md ### What went wrong * It took nearly a week to set up a branch for fix patches. This was due to some confusion over whether to use the new GitHub Security advisories, which didn’t support the required permission model and CI integrations. In the process, the envoy-setec branch was temporarily made readable to all Envoy contributors. * While resolving the above permission issue, we hit an issue with Github permissions on envoyproxy: people could no longer assign issues to members in the Envoy repository. This was fixed with some restructuring of GitHub team’s to support the limited GitHub IAM model. * It was possible to push to envoy-setec branches by fix team, e.g. the 1.11.2 could be directly pushed to (master as well). We need branch protection to ensure that CI gates merges; this will provide confidence that the staged release branches are likely to work on the main Envoy repository. * We had manual patch sets the day before release, but no envoy-setec branches reflecting them passing end-to-end. We should not consider a release ready to go until it passes a full CI pass. * It wasn’t possible to get a full CI pass due to docs/image/etc push issues. We should have a set of presubmits that provide a simple yes/no in the GH UX. * Our route resolution fuzzer would not have picked up the regex vulnerability due to a logical error in the fuzzer. * Our more efficient request and response fuzzers would not have picked up this vulnerability earlier. They only fuzz a single HEADER frame, and the maximum frame size for HTTP/2 is by default 16 KB. * From a distributor: “We didn't realize about safe_regex until the note this morning. So we're patching ... to switch to safe_regex -- would it be possible in future notes to distributors to note if usage changes are required?” * We coupled the CVE-2019-15225 and CVE-2019-15226 releases. This made sense initially, due to release overhead, but as the release date for the header map fixes was extended, it meant that a somewhat known vulnerability was fixed on master but not on any released version of Envoy. ### Where we got lucky * Release branches (master and v1.12.2) passed had only minor CI failures (bazel.compile_time_options) despite no complete pass of either assembled branch on private fix branch ## Timeline All times US/Pacific 2019-07-25: * [CVE-2019-15225] https://github.com/envoyproxy/envoy/issues/7728 was opened reporting crashes from route regex matches with very long request URIs 2019-08-09: * [CVE-2019-15226] ClusterFuzz reports https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=16325 under embargo. 2019-08-13: * [CVE-2019-15226] Email thread to envoy-security@googlegroups.com regarding the HeaderMap DoS. Analysis began to determine similar O(n^2) performance in code that uses HeaderMap. 2019-08-19: * CVE ID Request for CVE-2019-15225 and CVE-2019-15226 2019-08-20: * Branch permissions / CI permissions for envoy-setec branch set up. 2019-08-21: * [CVE-2019-15226] Draft fix PRs for CVE-2019-15226 were shared on private Envoy security repository. Reviews and further development occurred over the next three weeks. 2019-08-23: * [CVE-2019-15225] Fix for CVE-2019-15225 is opened at https://github.com/envoyproxy/envoy/pull/7878 2019-09-18: * [CVE-2019-15226] CVE summary details shared with cncf-envoy-distributors-announce@lists.cncf.io. 2019-09-19: * [CVE-2019-15226] Vulnerability exists in all Envoy distributions for HTTP/2 traffic, CVE updated 2019-09-24: * [CVE-2019-15226] Candidate fix patches was shared with cncf-envoy-distributors-announce@lists.cncf.io. 2019-10-07: * [CVE-2019-15226] Patch sets assembled based on previous reviewed work as manual patches. 2019-10-08: * [CVE-2019-15226] Some last minute patch fixup to have staged branches pass on CI. Patches scheduled for public release. * 11:20 AM v1.11.2 pushed * [CVE-2019-15226] CVE updated on publication 2019-10-10 * [CVE-2019-15226] Filed follow-up GitHub issue https://github.com/envoyproxy/envoy/issues/8567