Memory disclosure mitigations in CopperheadOS

Posted by Team Copperhead on September 20, 2016

Security features relevant to squashing memory leaks

Leaking sensitive information via memory disclosure is an extremely common class of vulnerability in traditional systems programming languages. In the common case, the leaked information is useful for performing further exploitation by bypassing probabilistic mitigations. For example, leaks of even partial addresses can be used to subvert Address Space Layout Randomization (ASLR) and the random values used to generate stack canaries, malloc canaries and mangle registers in setjmp buffers can be discovered. However, there are also more serious cases, some of which were even hyped enough to get their own branding like Heartbleed. These issues are primarily caused by holes opened up by the lack of memory safety in languages like C. It's possible for these bugs to occur at higher level such as via an object pool or a manually reused buffer, but it's much less common. In truly high-level code without micro-optimizations, it's rarely a problem. Therefore, the main mitigation Android already has against these issues is using a lot of memory safe code and pushing it heavily for third party code.

CopperheadOS deploys an arsenal of mitigations to detect many of these bugs as they happen along with eliminating the damage for most of the issues that are not directly detected. As a security focused operating system, it has a significantly larger performance budget to implement these mitigations than the upstream Android Open Source Project. As long as there's a negligible impact on battery life and UX latency, CopperheadOS can happily accept a 10-30% loss of throughput in C and C++ code. Since these mitigations aren't necessary for Java, the performance cost for that code is close to zero as CopperheadOS relies on full ahead-of-time compilation for Java rather than it being interpreted by C and C++ code.

Sanitizers

CopperheadOS (and AOSP to a lesser extent) uses some of the LLVM sanitizers as hardening features. Some of the sanitizers are only suitable for debugging (AddressSanitizer, ThreadSanitizer, MemorySanitizer) but there's a production-oriented trapping mode for UndefinedBehaviorSanitizer and a subset of that is quite useful for hardening. Ideally, a subset of UBSan could be enabled across the board based on the desired performance vs. security balance. However, it's extremely common for C code to contain undefined behavior even in regular usage.

The sanitizers are only enabled with Clang rather than GCC, since Android is phasing out GCC and already defaults to Clang in Nougat. The GCC UBSan implementation is also significantly less robust right now. It has too many false positives and false negatives make it less useful than it should be. This is no longer much of an issue since Android Nougat defaults to using Clang for userspace with only a few projects left that set LOCAL_CLANG := false due to compatibility issues. It's only a matter of time before Google moves to using Clang for the kernel, so the focus needs to be entirely on improving Clang rather than GCC. Switching compilers would be too risky.

Bounds checking

CopperheadOS enables -fsanitize=bounds by default with it disabled for a few modules where it uncovers bugs in regular use. It only makes sense for us to fix these bugs when the modules in question are relevant to security and it's going to involve a fair bit of work. Even though the bugs uncovered in regular use are usually not security bugs, they hold back hardening and thus reduce security.

The -fsanitize=bounds sanitizer is really an alias for -fsanitize=array-bounds,local-bounds. The array-bounds sanitizer simply adds bounds checking for direct usage of fixed-size arrays in the frontend. The semantics of the local-bounds sanitizer are more complex since it adds the checks in the LLVM layer via a pass. It can catch cases missed by array-bounds, but it can't be relied upon to do something consistent across compiler versions or after even minor changes to code.

CopperheadOS also enables -fsanitize=object-size for some projects, which is essentially _FORTIFY_SOURCE=1 for all pointer arithmetic. It adds comparisons against the llvm.objectsize intrinsic which is then lowered to a lower bound on the size part of the way through the LLVM optimization stack. It heavily depends upon optimization. This sanitizer ends up uncovering an enormous amount of latent undefined behavior so it's difficult to expand the coverage. It ends up detecting a lot of undefined behavior not directly caused by out-of-bounds accesses too, since accesses can be considered out-of-bounds based on invalid casts, etc. and that all feeds into the analysis used to lower the llvm.objectsize intrinsic.

These three sanitizers have a lot of overlap, but the checks are optimized out in the trapping mode when they are clearly redundant. It can add up to a substantial performance cost, but it's still less than the cost of memory safety via full bounds checking would be and that's what is ultimately desired. Note that these can actually catch a lot of issues missed by ASan, but that's not particularly relevant when ASan isn't really an option for production (out of scope for this post).

#include <stddef.h>
#include <stdio.h>

void access(int *a) {
  printf("%d\n", a[4]);
}

int main(void) {
  int a[4] = {1, 2, 3, 4};

  // caught by array-bounds, local-bounds (without -O2) and object-size (with -O2)
  for (size_t i = 0; i <= sizeof(a) / sizeof(a[0]); i++) {
    printf("%d\n", a[i]);
  }

  // caught by object-size (with -O2)
  int *b = a;
  for (size_t i = 0; i <= sizeof(a) / sizeof(a[0]); i++) {
    printf("%d\n", b[i]);
  }

  // caught by object-size (with -O2)
  access(a);

  return 0;
}

A future article will dive deeper into the capabilities of the local-bounds and object-size sanitizers, since they're not straightforward features like array-bounds. The effectiveness of local-bounds/object-size could be greatly improved.

Preventing reads of uninitialized data

CopperheadOS extends Clang with a local-init sanitizer, enabling zeroing for all local variables without an initializer. This is enabled across the board, so it covers nearly all of userspace already other than the proprietary libraries from the SoC vendors (NVIDIA, Qualcomm) and a few projects setting LOCAL_CLANG := false due to compatibility issues.

The local-init sanitizer prevents leaks from the stack or registers due to uninitialized local variables in C. For C++, it isn't quite a full solution due to constructors. The local-init sanitizer currently considers a C++ constructor to be an initializer, but constructors can leave fields uninitialized and will almost always leave the padding between fields untouched. It's less common to leak padding than a field but it's still a pervasive issue. The current plan is simply extending the local-init sanitizer to zero before constructors are called. An alternative approach could be taken to generate code for the constructors. That would potentially provide more coverage but CopperheadOS doesn't need this for malloc memory and custom allocators are uncommon enough that they can be realistically handled case-by-case.

A simple example:

#include <stdio.h>

int main(void) {
  int a;
  printf("%d\n", a);
  return 0;
}


LLVM IR diff:

~~~ diff
--- old.ll	2016-08-07 12:12:53.970214809 -0400
+++ new.ll	2016-08-07 12:13:05.050109794 -0400
@@ -10,6 +10,7 @@
   %1 = alloca i32, align 4
   %2 = alloca i32, align 4
   store i32 0, i32* %1, align 4
+  store i32 0, i32* %2, align 4
   %3 = load i32, i32* %2, align 4
   %4 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i32 0, i32 0), i32 %3)
   ret i32 0

The performance hit from local-init is usually negligible. The code is generated by the frontend so the entire LLVM optimization stack is available to remove the zeroing whenever the compiler can prove that it isn't necessary. It may seem naive to simply zero every uninitialized local, but it really ends up resulting in sophisticated analysis. When the compiler cannot prove that it's redundant, the performance hit is still usually quite small. The current stack frame is extremely hot by nature so it's not going to cause many extra cache misses and clearing memory is cheap. There's the potential for a significant performance hit if there's a tiny function with huge local variables, especially if they are not usually used in their entirety, but there isn't a known example of this in Android.

Integer overflow checking

Integer overflows may not seem related to memory disclosure, but it's not uncommon for there to be arbitrary read vulnerabilities or more constrained cases due to overflows in calculations used to calculate sizes for memory allocation or bounds checking. The Android Open Source Project began making use of -fsanitize=integer (or subsets of it) across many projects during the development of Android Nougat. It's now extensively used for the media libraries, since most of the high/critical severity vulnerabilities discovered there were ultimately caused by integer overflows. CopperheadOS backported some of this to Marshmallow and makes some extensions to it. It's now present in the base OS in Nougat, so there's less work to do to cover the critical areas.

Even though -fsanitize=integer is part of the feature set provided by UBSan, it really goes beyond checking for undefined behavior. It includes the unsigned-integer-overflow sanitizer, which was one of the main reasons for Android adopting it. The integer sanitizer could be significantly more useful than it is right now by expanding even further down the road of catching rarely intentional suspicious behavior, which is the topic of an upcoming article. It makes sense to catch even well defined overflows by default with explicit exceptions in the code where it's intentional. There's so much undefined behavior in real world code that adapting it for undefined behavior detection is not substantially different than adapting it for suspicious behavior detection too.

Hardened allocator

CopperheadOS uses an extended port of OpenBSD malloc as the general purpose allocator and many of the features are relevant to mitigating memory disclosure bugs.

The most important feature for this is junk filling on free. CopperheadOS alters the way this works to guarantee full junk filling on free even for large allocations, so there's a guarantee that fresh malloc memory will either be set to 0x00 or 0xdf bytes. Junk filling is also still used when page cache memory protection is enabled, unlike OpenBSD. There's support for initializing malloc data to 0xd0 but it's a debugging feature and isn't used by CopperheadOS or exposed to users as part of the performance vs. security tuning.

Another relevant mitigation are the random canaries placed at the end of each small allocation. This started out as a CopperheadOS extension but was submitted and accepted upstream. It's enabled by default in CopperheadOS since the main cost is memory usage and there's a lot of memory to spare on the supported devices. The canaries prevent small out-of-bounds accesses from reaching another allocation. One potential improvement relevant to memory disclosures would be replacing a random byte in the canary with a zero byte for production usage (not for debugging) in order to provide a fallback for C string usage with a missing NUL terminator. The canary generation would also ideally be done with something like SipHash, not a weak hash, but it's a performance issue and requires investigation. Leaking enough canaries right now could reveal information about addresses since it's the input used to make each canary unique. The problem could be reduced by subtracting the base address from the addresses but a proper solution would be better. There's also support for adding guard pages to the end of large allocations, which is not enabled by default but is exposed as an option to users.

Beyond these direct mitigations, the design of the malloc implementation can make it more difficult to leak data due to randomization and the fact that small allocations are partitioned based on size class. The basic partitioning is something that could be extended into something more like PartitionAlloc in the future, with the page caching being separated per size class and potentially making direct use of guard pages for the small allocation regions rather than relying on randomization and holes in the mmap heap. Memory disclosure can also happen via read-after-free, which is targeted by the quarantine feature extended by CopperheadOS. See the documentation on the malloc implementation for more details.

`_FORTIFY_SOURCE` extensions

Unlike glibc, Bionic has fortified implementations of functions performing memory reads rather than only writes. CopperheadOS extended the coverage of _FORTIFY_SOURCE, especially in regards to memory reads.

In addition to having it cover more functions, CopperheadOS also adds dynamic overflow checks based on querying malloc for allocation size to the subset of fortified functions that are system call wrappers. It could be extended to other calls based on the existing security vs. performance setting exposed to users, but the default will likely remain the same. Covering fread and fwrite wouldn't be that bad, but the string.h functions are way too performance critical to pay the cost of grabbing and release a lock to query malloc.

Persistent storage

Old data lingering around is a problem at other levels of storage too. CopperheadOS switched from Android's default PERSIST journal mode for SQLite to TRUNCATE and makes use of SECURE_DELETE by default. This guarantees that deleted data is at least cleared out from the database and won't be accessible via memory or filesystem access. It may still linger around on flash, especially for the journal where it's not currently wiped at all. However, obtaining it from there would require root access, so the mitigations are still effective albeit within a limited scope. TRIM actually helps a fair bit here, but trying to force clearing on the storage itself is not currently viewed as something that's in-scope.

Kernel

The same issues dealt with for userspace in CopperheadOS also apply to the kernel. The PaX USERCOPY feature is a more sophisticated equivalent to the CopperheadOS dynamic _FORTIFY_SOURCE checks albeit only for copy_to_user/copy_from_user. The MEMORY_SANITIZE feature provides junk-on-free for pages (via zeroing for performance) and the slab allocators. The STRUCTLEAK GCC plugin is roughly equivalent to -fsanitize=local-init, but it's more conservative and only clears the memory when there are __user markings. The STACKLEAK GCC plugin provides something that's already there for userspace via munmap in Bionic: clearing kernel stacks on exit to userspace.

Note that the public CopperheadOS releases do not currently use PaX kernels as they did in the past, due to lack of support for ARM64 in PaX along with the difficulty of providing production quality PaX kernels for Android kernels due to large amounts of out-of-tree code and the fact that the version is frozen per-device (3.10 for the Nexus 9, 5X and 6P) and doesn't match up with the long-term support stable PaX releases, which is no longer public anyway.