EDIT2: As of July 13th 2009, the Linux kernel integrates our patch (2.6.31-rc3). Our patch also made it into -stable.
EDIT1: This is now referenced as a vulnerability and tracked as CVE-2009-1895
NULL pointers dereferences are a common security issue in the Linux kernel.
In the realm of userland applications, exploiting them usually requires being able to somehow control the target's allocations until you get page zero mapped, and this can be very hard.
In the paradigm of locally exploiting the Linux kernel however, nothing (before Linux 2.6.23) prevented you from mapping page zero with mmap() and crafting it to suit your needs before triggering the bug in your process' context. Since the kernel's data and code segment both have a base of zero, a null pointer dereference would make the kernel access page zero, a page filled with bytes in your control. Easy.
This used to not be the case, back in Linux 2.0 when the kernel's data segment's base was above PAGE_OFFSET and the kernel had to explicitely use a segment override (with the fs selector) to access data in userland. The same rough idea is now used in PaX/GRSecurity's UDEREF to prevent exploitation of "unexpected to userland kernel accesses" (it actually makes use of an expand down segment instead of a PAGE_OFFSET segment base, but that's a detail).
Kernel developpers tried to solve this issue too, but without resorting to segmentation (which is considered deprecated and is mostly not available on x86_64) and in a portable (cross architectures) way. In 2.6.23, they introduced a new sysctl, called vm.mmap_min_addr, that defines the minimum address that you can request a mapping at. Of course, this doesn't solve the complete issue of "to userland pointer dereferences" and it also breaks the somewhat useful feature of being able to map the first pages (this breaks Dosemu for instance), but in practice this has been effective enough to make exploitation of many vulnerabilities harder or impossible.
Recently, Tavis Ormandy and myself had to exploit such a condition in the Linux kernel. We investigated a few ideas, such as:
- using brk()
- creating a MAP_GROWSDOWN mapping just above the forbidden region (usually 64K) and segfaulting the last page of the forbidden region
- obscure system calls such as remap_file_pages
- putting memory pressure in the address space to let the kernel allocate in this region
- using the MAP_PAGE_ZERO personality
So what does the default security module do in cap_file_mmap? This is the relevant code (in security/capability.c on recent versions of the Linux kernel):
if ((addr < mmap_min_addr) && !capable(CAP_SYS_RAWIO)) return -EACCES; return 0;Meaning that a process with CAP_SYS_RAWIO can bypass this check. How can we get our process to have this capability ? By executing a setuid binary of course! So we set the MMAP_PAGE_ZERO personality and execute a setuid binary. Page zero will get mapped, but the setuid binary is executing and we don't have control anymore.
So, how do we get control back ? Using something such as "/bin/su our_user_name" could be tempting, but while this would indeed give us control back, su will drop privileges before giving us control back (it'd be a vulnerability otherwise!), so the Linux kernel will make exec fail in the cap_file_mmap check (due to the MMAP_PAGE_ZERO personality).
So what we need is a setuid binary that will give us control back without going through exec. We found such a setuid binary that is installed on many Desktop Linux machines by default: pulseaudio. pulseaudio will drop privileges and let you specify a library to load though its -L argument. Exactly what we needed!
Once we have one page mapped in the forbidden area, it's game over. Nothing will prevent us from using mremap to grow the area and mprotect to change our access rights to PROT_READ|PROT_WRITE|PROT_EXEC. So this completely bypasses the Linux kernel's protection.
Note that apart from this problem, the mere fact that MMAP_PAGE_ZERO is not in the PER_CLEAR_ON_SETID mask and thus is allowed when executing setuid binaries can be a security issue: being able to map page zero in a process with euid=0, even without controlling its content could be useful when exploiting a null pointer vulnerability in a setuid application.
We believe that the correct fix for this issue is to add MMAP_PAGE_ZERO to the PER_CLEAR_ON_SETID mask.
PS: Thanks to Robert Swiecki for some help while investigating this.