Tavis Ormandy and myself have recently released an advisory for CVE-2009-2267.
This is a vulnerability in VMware's virtual CPU which can lead to privilege escalation in a guest. All VMware virtualisation products were affected, including in hardware virtualisation mode.
In a VMware guest, in the general case, unprivileged (Ring 3) code runs without VMM intervention until an exception or interrupt occurs. An exception to this is Virtual-8086 mode (VM86) where VMware will perform CPU emulation.
When VMware was emulating a far call instruction in VM86 mode, it was using supervisory access to push the CS and IP registers. Because of this, if this operation raisee a Page Fault (#PF) exception, the resulting exception code would be invalid and would have it's user/supervisor flag incorrectly set.
This can be used to confuse a Guest kernel. Moreover, VM86 mode can be used to further confuse the guest kernel because it allows an attacker to load an arbitrary value in the code segment (CS) register.
We wrote a reliable proof of concept to elevate privileges on Linux guests. It turned out to be very easy because of the PNP BIOS recovery code.
For further details, check our advisory, VMware's advisory and the non weaponized PoC (vmware86.c, vmware86.tar.gz), including Tavis' cool CODE32 macro.
Note that VMware silently patches their products until all all of them are updated and then releases an advisory. If you have updated VMware Workstation a few month ago, you were already protected against this vulnerability.
In theory, VMware's Virtual CPU flaws could be treated like Intel or AMD errata and worked around in operating systems. In practice, since VMware's software can be updated, this is unlikely to happen. Moreover, VMware doesn't release full details that could be used to produce work arounds.
If you like virtual CPU vulnerabilities, I suggest that you have a look at Derek Soeder's awesome advisory from last year.
Friday, October 30, 2009
Wednesday, October 14, 2009
Security in Depth for Linux Software
Chris Evans and myself have presented last week at Hack In The Box Malaysia about "Security in Depth for Linux software". You can find the slides here.
The talk was focused on writing good code and sandboxing.
The writing goode code part was using vsftpd as an example, since Chris has got this right for ten years now.
In the second part, we defined sandboxing, which we also call discretionary privilege dropping, as the ability to drop privileges programmatically and without administrative authority on the machine.
We explained some of the conceptual differences between sandboxing in this sense, where the application writer chooses to make part of his code run without certain privileges, and Mandatory Access Control systems, where the application itself doesn't make the policy.
From an application writer perspective, sandboxing facilities are desirable since they will allow your code to run with lower privileges on all machines. On the other hand, MAC is desirable from a system administrator or distribution maintainer perspective as it will allow one policy to rule over many applications and to enforce certain security properties on the system.
While Linux has a fair number of MAC systems available, sandboxing options are for now very limited. There is some hope that the ftrace framework or SELinux bounded types may allow this in the future (see also Adam Langley's post on LSMSB), but this will not be widely available anytime soon.
We demonstrated different ways of overcoming those limitations on readily available Linux kernels, focusing on three designs experimented or used in vsftpd and Chromium.
The talk was focused on writing good code and sandboxing.
The writing goode code part was using vsftpd as an example, since Chris has got this right for ten years now.
In the second part, we defined sandboxing, which we also call discretionary privilege dropping, as the ability to drop privileges programmatically and without administrative authority on the machine.
We explained some of the conceptual differences between sandboxing in this sense, where the application writer chooses to make part of his code run without certain privileges, and Mandatory Access Control systems, where the application itself doesn't make the policy.
From an application writer perspective, sandboxing facilities are desirable since they will allow your code to run with lower privileges on all machines. On the other hand, MAC is desirable from a system administrator or distribution maintainer perspective as it will allow one policy to rule over many applications and to enforce certain security properties on the system.
While Linux has a fair number of MAC systems available, sandboxing options are for now very limited. There is some hope that the ftrace framework or SELinux bounded types may allow this in the future (see also Adam Langley's post on LSMSB), but this will not be widely available anytime soon.
We demonstrated different ways of overcoming those limitations on readily available Linux kernels, focusing on three designs experimented or used in vsftpd and Chromium.
- Using ptrace(), vsftpd experiment
- The setuid sandbox design (Julien Tinnes, Tavis Ormandy), Chromium
- The SECCOMP sandbox design (Markus Gutschke, Adam Langley), Chromium
Wednesday, September 16, 2009
CVE-2009-2793: Iret #GP on pre-commit handling failure: the NetBSD case
A few months ago, Tavis Ormandy and myself have used the fact that iret can fail with a General Protection (#GP) exception before the processor "commits" to user-mode (switches privileges by setting CS) on multiple occasions (more on this at upcoming PacSec)
It's not necessarily obvious that an inter-privilege iret (typically from kernel mode to user mode) can fail before the privilege switch occurs. It's however the case if the restored EIP is past the code segment limits: a #GP exception will be raised while in kernel mode.
When this occurs, an exception is raised from kernel mode with a handler in kernel mode: since there is no privilege level switch, no stack switch occurs and the trap frame will not contain saved stack information.
If an operating system's kernel does not expect this to happen, it may assume a full trap frame with saved stack registers. This is what happens in NetBSD.
An interesting point in the NetBSD case is that due to the lazy handling of the non executable stack emulation, a legitimate program could trigger the bug:
It's not necessarily obvious that an inter-privilege iret (typically from kernel mode to user mode) can fail before the privilege switch occurs. It's however the case if the restored EIP is past the code segment limits: a #GP exception will be raised while in kernel mode.
When this occurs, an exception is raised from kernel mode with a handler in kernel mode: since there is no privilege level switch, no stack switch occurs and the trap frame will not contain saved stack information.
If an operating system's kernel does not expect this to happen, it may assume a full trap frame with saved stack registers. This is what happens in NetBSD.
An interesting point in the NetBSD case is that due to the lazy handling of the non executable stack emulation, a legitimate program could trigger the bug:
- The legitimate program has code on the stack. For instance due to a GCC-genereated trampoline for a nested function.
- The stack with be marked as executable but the code segment limit will not be raised yet: on stack execution, the kernel will handle the #GP exception and raise the limit (lazy handling).
- A signal handler gets set to this nested function
- The kernel delivers a signal to the process and iret to the code on the stack, such raising #GP pre-commit.
Friday, August 28, 2009
CVE-2009-2698: udp_sendmsg() vulnerability
EDIT: p0c73n1 has posted an exploit for this to milw0rm as did andi@void.at, and spender wrote "the rebel"
Tavis Ormandy and myself have recently reported CVE-2009-2698 which has been disclosed at the beginning of the week.
This flaw affects at least Linux 2.6 with a version < 2.6.19.
When we ran into this, we realized the newest kernel versions were not affected by the PoC code we had. The reason for this was that Herbert Xu had found and corrected a closely related bug. Linux distributions running on 2.6.18 and earlier kernels did not realize the security impact of this fix and did not backport it.
This is a good example on how hard it is to backport relevant fixes to maintained stable versions of the kernel.
If you look at udp_sendmsg, you will see that the rt routing table is initialized as NULL and some code paths can lead to call ip_append_data with a NULL rt. ip_append_data() obviously doesn't handle this case properly and will cause a NULL pointer dereference.
Note that this is a data NULL pointer dereference and mapping code at page zero will not lead to immediate privileged code execution for a local attacker. However, controlling the rtable structure seems to give enough control to the attacker to elevate privileges.
Since it's hard to guarantee that ip_append_data will never be called with a NULL *rtp, we believe that this function should be made more robust by using this patch.
Here's one way to trigger this vulnerability locally:
$ cat croissant.c
#include <sys/types.h>
#include <sys/socket.h>
#include <string.h>
int main(int argc, char **argv)
{
int fd = socket(PF_INET, SOCK_DGRAM, 0);
char buf[1024] = {0};
struct sockaddr to = {
.sa_family = AF_UNSPEC,
.sa_data = "TavisIsAwesome",
};
sendto(fd, buf, 1024, MSG_PROXY | MSG_MORE, &to, sizeof(to));
sendto(fd, buf, 1024, 0, &to, sizeof(to));
return 0;
}
An effective implementation of mmap_min_addr or the UDEREF feature of PaX/GrSecurity would prevent local privilege escalation through this issue.
Tavis Ormandy and myself have recently reported CVE-2009-2698 which has been disclosed at the beginning of the week.
This flaw affects at least Linux 2.6 with a version < 2.6.19.
When we ran into this, we realized the newest kernel versions were not affected by the PoC code we had. The reason for this was that Herbert Xu had found and corrected a closely related bug. Linux distributions running on 2.6.18 and earlier kernels did not realize the security impact of this fix and did not backport it.
This is a good example on how hard it is to backport relevant fixes to maintained stable versions of the kernel.
If you look at udp_sendmsg, you will see that the rt routing table is initialized as NULL and some code paths can lead to call ip_append_data with a NULL rt. ip_append_data() obviously doesn't handle this case properly and will cause a NULL pointer dereference.
Note that this is a data NULL pointer dereference and mapping code at page zero will not lead to immediate privileged code execution for a local attacker. However, controlling the rtable structure seems to give enough control to the attacker to elevate privileges.
Since it's hard to guarantee that ip_append_data will never be called with a NULL *rtp, we believe that this function should be made more robust by using this patch.
Here's one way to trigger this vulnerability locally:
$ cat croissant.c
#include <sys/types.h>
#include <sys/socket.h>
#include <string.h>
int main(int argc, char **argv)
{
int fd = socket(PF_INET, SOCK_DGRAM, 0);
char buf[1024] = {0};
struct sockaddr to = {
.sa_family = AF_UNSPEC,
.sa_data = "TavisIsAwesome",
};
sendto(fd, buf, 1024, MSG_PROXY | MSG_MORE, &to, sizeof(to));
sendto(fd, buf, 1024, 0, &to, sizeof(to));
return 0;
}
An effective implementation of mmap_min_addr or the UDEREF feature of PaX/GrSecurity would prevent local privilege escalation through this issue.
Thursday, August 13, 2009
Linux NULL pointer dereference due to incorrect proto_ops initializations (CVE-2009-2692)
EDIT2: Here is RedHat's official mitigation recommendation
EDIT3: Brad Spengler also wrote an exploit for this and published it. The bug triggering is based on our exploit which leaked to Brad though the private vendor-sec mailing list. He implements the personality trick Tavis and I published in June to bypass mmap_min_addr and also makes use of a feature that allows any unconfined user to gain the right to map at address zero in Redhat's default SELinux policy. He wrote a reliable shellcode for this one that should work pretty much anywhere on x86 and x86_64 machines.
EDIT4: if you use Debian or Ubuntu on your machine, I have specifically updated the kernelsec Debian/Ubuntu GrSecurity packages to protect against this bug and others.
EDIT5: Zinx wrote an ARM Android root exploit
EDIT6: Ramon de Carvalho Valle wrote a PPC/PPC64/x86_64/i386 exploit
Tavis Ormandy and myself have recently found and investigated a Linux kernel vulnerability (CVE-2009-2692). It affects all 2.4 and 2.6 kernels since 2001 on all architectures. We believe this is the public vulnerability affecting the greatest number of kernel versions.
The issue lies in how Linux deals with unavailable operations for some protocols. sock_sendpage and others don't check for NULL pointers before dereferencing operations in the ops structure. Instead the kernel relies on correct initialization of those proto_ops structures with stubs (such as sock_no_sendpage) instead of NULL pointers.
At first sight, the code in af_ipx.c looks correct and seems to initialize .sendpage properly. However, due to a bug in the SOCKOPS_WRAP macro, sock_sendpage will not be initialized. This code is very fragile and there are many other protocols where proto_ops are not correctly initialized at all (vulnerable even without the bug in SOCKOPS_WRAP), see bluetooth for instance.
So it was decided that instead of patching all those protocols and continue to rely on this very fragile code, sock_sendpage would get patched to check against NULL. This was already the way sock_splice_read and others were handled.
Since it leads to the kernel executing code at NULL, the vulnerability is as trivial as it can get to exploit (edit: that's for local privilege escalation and on Intel architectures): an attacker can just put code in the first page that will get executed with kernel privileges. Our exploit took a few minutes to adapt from a previous one:
$ ./leeches
// ------------------------------------------------------
You can read our advisory here.
Note: this has been featured on Slashdot, OSNews, TheRegister, ZDNet and others
EDIT3: Brad Spengler also wrote an exploit for this and published it. The bug triggering is based on our exploit which leaked to Brad though the private vendor-sec mailing list. He implements the personality trick Tavis and I published in June to bypass mmap_min_addr and also makes use of a feature that allows any unconfined user to gain the right to map at address zero in Redhat's default SELinux policy. He wrote a reliable shellcode for this one that should work pretty much anywhere on x86 and x86_64 machines.
EDIT4: if you use Debian or Ubuntu on your machine, I have specifically updated the kernelsec Debian/Ubuntu GrSecurity packages to protect against this bug and others.
EDIT5: Zinx wrote an ARM Android root exploit
EDIT6: Ramon de Carvalho Valle wrote a PPC/PPC64/x86_64/i386 exploit
Tavis Ormandy and myself have recently found and investigated a Linux kernel vulnerability (CVE-2009-2692). It affects all 2.4 and 2.6 kernels since 2001 on all architectures. We believe this is the public vulnerability affecting the greatest number of kernel versions.
The issue lies in how Linux deals with unavailable operations for some protocols. sock_sendpage and others don't check for NULL pointers before dereferencing operations in the ops structure. Instead the kernel relies on correct initialization of those proto_ops structures with stubs (such as sock_no_sendpage) instead of NULL pointers.
At first sight, the code in af_ipx.c looks correct and seems to initialize .sendpage properly. However, due to a bug in the SOCKOPS_WRAP macro, sock_sendpage will not be initialized. This code is very fragile and there are many other protocols where proto_ops are not correctly initialized at all (vulnerable even without the bug in SOCKOPS_WRAP), see bluetooth for instance.
So it was decided that instead of patching all those protocols and continue to rely on this very fragile code, sock_sendpage would get patched to check against NULL. This was already the way sock_splice_read and others were handled.
Since it leads to the kernel executing code at NULL, the vulnerability is as trivial as it can get to exploit (edit: that's for local privilege escalation and on Intel architectures): an attacker can just put code in the first page that will get executed with kernel privileges. Our exploit took a few minutes to adapt from a previous one:
$ ./leeches
// ------------------------------------------------------
// sendpage linux local ring0
// ---------------- taviso@sdf.lonestar.org, julien@cr0.org
// leeches.c:Aug 11 2009
// GreetZ: LiquidK, lcamtuf, Spoonm, novocainated, asiraP, ScaryBeasts, spender, pipacs, stealth, jagger, redpig, Neel and all the other leeches we forgot to mention!
Enjoy some photography while at ring0 @ http://flickr.com/meder
For our webapp friends, here is an XSS executing at ring 0: javascript:alert(1);
shellcode now executing chmod("/bin/sh", 04755), welcome to ring0
Killed
$ sh
# id
uid=1000(julien) gid=1000(julien) euid=0(root)
On x86/x86_64, this issue could be mitigated by three things:// ---------------- taviso@sdf.lonestar.org, julien@cr0.org
// leeches.c:Aug 11 2009
// GreetZ: LiquidK, lcamtuf, Spoonm, novocainated, asiraP, ScaryBeasts, spender, pipacs, stealth, jagger, redpig, Neel and all the other leeches we forgot to mention!
Enjoy some photography while at ring0 @ http://flickr.com/meder
For our webapp friends, here is an XSS executing at ring 0: javascript:alert(1);
shellcode now executing chmod("/bin/sh", 04755), welcome to ring0
Killed
$ sh
# id
uid=1000(julien) gid=1000(julien) euid=0(root)
- the recent mmap_min_addr feature. Note that this feature has known issues until at least 2.6.30.2. See also this LWN article.
- on IA32 with PaX/GrSecurity, the KERNEXEC feature (x86 only)
- not implementing affected protocols (a.k.a., reducing your attack surface by disabling what you don't need): PF_APPLETALK, PF_IPX, PF_IRDA, PF_X25, PF_AX25, PF_BLUETOOTH, PF_IUCV, IPPROTO_SCTP/PF_INET6, PF_PPPOX, PF_ISDN, but there may be more. (Update: See RedHat's mitigation)
You can read our advisory here.
Note: this has been featured on Slashdot, OSNews, TheRegister, ZDNet and others
Thursday, July 16, 2009
Old school local root vulnerability in pulseaudio (CVE-2009-1894)
Today was chosen as disclosure day for CVE-2009-1894.
Tavis Ormandy and myself have recently used the fact that pulseaudio was set-uid root to bypass Linux' NULL pointer dereference prevention. This technique is relying on a limitation in the Linux kernel and not on a bug in pulseaudio. But we also found one unrelated bug in pulseaudio.
Since it's set-uid root, we thought we would give pulseaudio a quick look. In the very first lines of main(), you can find the following:
if (!getenv("LD_BIND_NOW")) {
char *rp;
putenv(pa_xstrdup("LD_BIND_NOW=1"));
pa_assert_se(rp = pa_readlink("/proc/self/exe"));
pa_assert_se(execv(rp, argv) == 0);
}
So, pulseaudio is re-executing itself through /proc/self/exe, so that the dynamic linker performs all relocation immediately at load-time.
There is an obvious race condition here. /proc/self/exe is a symbolic link to the actual pathname of the executed command: by creating a hard link to /usr/bin/pulseaudio, we control this pathname, and consequently the file under this pathname. Knowing this, the exploitation is trivial (Note that rename() is atomic, or alternatively note how __d_path() works with deleted entries).
It's also interesting to note that any operation performed on /proc/self/exe is guaranteed by the kernel to be performed on the same inode than the one that got executed (see proc_exe_link), something that two of my colleagues have recently pointed out to me. So if they had re-executed themselves by using /proc/self/exe directly, without going through readlink() first, they would not have been vulnerable. And actually they weren't before, if you read the Changelog, you'll find:
2007-10-29 15:33 lennart * : use real path of binary instead of /proc/self/exe to execute ourselves
Oops! (Thanks to my colleague Mike Mammarella for digging this)
Like the vulnerability in udevd, this is a very good example of a non memory corruption vulnerability which is trivial to exploit very reliably and in a cross-architecture way.
So, why does pulseaudio have the set-uid bit set, you may ask ? For real-time performances reasons it wants to keep CAP_SYS_NICE but will drop all other privileges.
This vulnerability could have been avoided if the principle of least privilege had been followed: Since privileges are not required to re-exec yourself, dropping privileges should have been the first thing pulseaudio did. Here it's only the second thing it does, and it was enough to make most Linux Desktops vulnerable.
If your distribution of choice did not patch this yet, or if you want to reduce your attack surface, you're advised to chmod u-s /usr/bin/pulseaudio. Also note that as with every setuid binary update, you should also check that your users didn't create "backup" vulnerable copies (hardlink), waiting to own your box with known vulnerabilities while you think you are safe from those.
PS: Here are two brain teasers for you:
1. Find a cool way to perform an action after execve() has succeeded in another process, but before main() executes. First, I've used a FD_CLOEXEC read descriptor in a pipe and a SIGPIPE handler, but while it gives good results in practice, there is not guarantee as to when the signal will get delivered. I've finally found (with a hint from Tavis) a 100% reliable way to do it that is always guaranteed to work at first try. Of course, such a level of sophistication is absolutely not needed for this exploit.
2. Since pulseaudio allows you to load arbitrary libraries, it allows you to run arbitrary code with CAP_SYS_NICE as a feature. In the light of NUMA coming to the desktop through QPI, can you do something more interesting than what you would first expect with this?
Tavis Ormandy and myself have recently used the fact that pulseaudio was set-uid root to bypass Linux' NULL pointer dereference prevention. This technique is relying on a limitation in the Linux kernel and not on a bug in pulseaudio. But we also found one unrelated bug in pulseaudio.
Since it's set-uid root, we thought we would give pulseaudio a quick look. In the very first lines of main(), you can find the following:
if (!getenv("LD_BIND_NOW")) {
char *rp;
putenv(pa_xstrdup("LD_BIND_NOW=1"));
pa_assert_se(rp = pa_readlink("/proc/self/exe"));
pa_assert_se(execv(rp, argv) == 0);
}
So, pulseaudio is re-executing itself through /proc/self/exe, so that the dynamic linker performs all relocation immediately at load-time.
There is an obvious race condition here. /proc/self/exe is a symbolic link to the actual pathname of the executed command: by creating a hard link to /usr/bin/pulseaudio, we control this pathname, and consequently the file under this pathname. Knowing this, the exploitation is trivial (Note that rename() is atomic, or alternatively note how __d_path() works with deleted entries).
It's also interesting to note that any operation performed on /proc/self/exe is guaranteed by the kernel to be performed on the same inode than the one that got executed (see proc_exe_link), something that two of my colleagues have recently pointed out to me. So if they had re-executed themselves by using /proc/self/exe directly, without going through readlink() first, they would not have been vulnerable. And actually they weren't before, if you read the Changelog, you'll find:
2007-10-29 15:33 lennart * : use real path of binary instead of /proc/self/exe to execute ourselves
Oops! (Thanks to my colleague Mike Mammarella for digging this)
Like the vulnerability in udevd, this is a very good example of a non memory corruption vulnerability which is trivial to exploit very reliably and in a cross-architecture way.
So, why does pulseaudio have the set-uid bit set, you may ask ? For real-time performances reasons it wants to keep CAP_SYS_NICE but will drop all other privileges.
This vulnerability could have been avoided if the principle of least privilege had been followed: Since privileges are not required to re-exec yourself, dropping privileges should have been the first thing pulseaudio did. Here it's only the second thing it does, and it was enough to make most Linux Desktops vulnerable.
If your distribution of choice did not patch this yet, or if you want to reduce your attack surface, you're advised to chmod u-s /usr/bin/pulseaudio. Also note that as with every setuid binary update, you should also check that your users didn't create "backup" vulnerable copies (hardlink), waiting to own your box with known vulnerabilities while you think you are safe from those.
PS: Here are two brain teasers for you:
1. Find a cool way to perform an action after execve() has succeeded in another process, but before main() executes. First, I've used a FD_CLOEXEC read descriptor in a pipe and a SIGPIPE handler, but while it gives good results in practice, there is not guarantee as to when the signal will get delivered. I've finally found (with a hint from Tavis) a 100% reliable way to do it that is always guaranteed to work at first try. Of course, such a level of sophistication is absolutely not needed for this exploit.
2. Since pulseaudio allows you to load arbitrary libraries, it allows you to run arbitrary code with CAP_SYS_NICE as a feature. In the light of NUMA coming to the desktop through QPI, can you do something more interesting than what you would first expect with this?
Friday, June 26, 2009
Bypassing Linux' NULL pointer dereference exploit prevention (mmap_min_addr)
EDIT3: Slashdot, the SANS Institute, Threatpost and others have a story about an exploit by Bradley Spengler which uses our technique to exploit a null pointer dereference in the Linux kernel.
EDIT2: As of July 13th 2009, the Linux kernel integrates our patch (2.6.31-rc3). Our patch also made it into -stable.
EDIT1: This is now referenced as a vulnerability and tracked as CVE-2009-1895
NULL pointers dereferences are a common security issue in the Linux kernel.
In the realm of userland applications, exploiting them usually requires being able to somehow control the target's allocations until you get page zero mapped, and this can be very hard.
In the paradigm of locally exploiting the Linux kernel however, nothing (before Linux 2.6.23) prevented you from mapping page zero with mmap() and crafting it to suit your needs before triggering the bug in your process' context. Since the kernel's data and code segment both have a base of zero, a null pointer dereference would make the kernel access page zero, a page filled with bytes in your control. Easy.
This used to not be the case, back in Linux 2.0 when the kernel's data segment's base was above PAGE_OFFSET and the kernel had to explicitely use a segment override (with the fs selector) to access data in userland. The same rough idea is now used in PaX/GRSecurity's UDEREF to prevent exploitation of "unexpected to userland kernel accesses" (it actually makes use of an expand down segment instead of a PAGE_OFFSET segment base, but that's a detail).
Kernel developpers tried to solve this issue too, but without resorting to segmentation (which is considered deprecated and is mostly not available on x86_64) and in a portable (cross architectures) way. In 2.6.23, they introduced a new sysctl, called vm.mmap_min_addr, that defines the minimum address that you can request a mapping at. Of course, this doesn't solve the complete issue of "to userland pointer dereferences" and it also breaks the somewhat useful feature of being able to map the first pages (this breaks Dosemu for instance), but in practice this has been effective enough to make exploitation of many vulnerabilities harder or impossible.
Recently, Tavis Ormandy and myself had to exploit such a condition in the Linux kernel. We investigated a few ideas, such as:
So what does the default security module do in cap_file_mmap? This is the relevant code (in security/capability.c on recent versions of the Linux kernel):
So, how do we get control back ? Using something such as "/bin/su our_user_name" could be tempting, but while this would indeed give us control back, su will drop privileges before giving us control back (it'd be a vulnerability otherwise!), so the Linux kernel will make exec fail in the cap_file_mmap check (due to the MMAP_PAGE_ZERO personality).
So what we need is a setuid binary that will give us control back without going through exec. We found such a setuid binary that is installed on many Desktop Linux machines by default: pulseaudio. pulseaudio will drop privileges and let you specify a library to load though its -L argument. Exactly what we needed!
Once we have one page mapped in the forbidden area, it's game over. Nothing will prevent us from using mremap to grow the area and mprotect to change our access rights to PROT_READ|PROT_WRITE|PROT_EXEC. So this completely bypasses the Linux kernel's protection.
Note that apart from this problem, the mere fact that MMAP_PAGE_ZERO is not in the PER_CLEAR_ON_SETID mask and thus is allowed when executing setuid binaries can be a security issue: being able to map page zero in a process with euid=0, even without controlling its content could be useful when exploiting a null pointer vulnerability in a setuid application.
We believe that the correct fix for this issue is to add MMAP_PAGE_ZERO to the PER_CLEAR_ON_SETID mask.
PS: Thanks to Robert Swiecki for some help while investigating this.
EDIT2: As of July 13th 2009, the Linux kernel integrates our patch (2.6.31-rc3). Our patch also made it into -stable.
EDIT1: This is now referenced as a vulnerability and tracked as CVE-2009-1895
NULL pointers dereferences are a common security issue in the Linux kernel.
In the realm of userland applications, exploiting them usually requires being able to somehow control the target's allocations until you get page zero mapped, and this can be very hard.
In the paradigm of locally exploiting the Linux kernel however, nothing (before Linux 2.6.23) prevented you from mapping page zero with mmap() and crafting it to suit your needs before triggering the bug in your process' context. Since the kernel's data and code segment both have a base of zero, a null pointer dereference would make the kernel access page zero, a page filled with bytes in your control. Easy.
This used to not be the case, back in Linux 2.0 when the kernel's data segment's base was above PAGE_OFFSET and the kernel had to explicitely use a segment override (with the fs selector) to access data in userland. The same rough idea is now used in PaX/GRSecurity's UDEREF to prevent exploitation of "unexpected to userland kernel accesses" (it actually makes use of an expand down segment instead of a PAGE_OFFSET segment base, but that's a detail).
Kernel developpers tried to solve this issue too, but without resorting to segmentation (which is considered deprecated and is mostly not available on x86_64) and in a portable (cross architectures) way. In 2.6.23, they introduced a new sysctl, called vm.mmap_min_addr, that defines the minimum address that you can request a mapping at. Of course, this doesn't solve the complete issue of "to userland pointer dereferences" and it also breaks the somewhat useful feature of being able to map the first pages (this breaks Dosemu for instance), but in practice this has been effective enough to make exploitation of many vulnerabilities harder or impossible.
Recently, Tavis Ormandy and myself had to exploit such a condition in the Linux kernel. We investigated a few ideas, such as:
- using brk()
- creating a MAP_GROWSDOWN mapping just above the forbidden region (usually 64K) and segfaulting the last page of the forbidden region
- obscure system calls such as remap_file_pages
- putting memory pressure in the address space to let the kernel allocate in this region
- using the MAP_PAGE_ZERO personality
So what does the default security module do in cap_file_mmap? This is the relevant code (in security/capability.c on recent versions of the Linux kernel):
if ((addr < mmap_min_addr) && !capable(CAP_SYS_RAWIO)) return -EACCES; return 0;Meaning that a process with CAP_SYS_RAWIO can bypass this check. How can we get our process to have this capability ? By executing a setuid binary of course! So we set the MMAP_PAGE_ZERO personality and execute a setuid binary. Page zero will get mapped, but the setuid binary is executing and we don't have control anymore.
So, how do we get control back ? Using something such as "/bin/su our_user_name" could be tempting, but while this would indeed give us control back, su will drop privileges before giving us control back (it'd be a vulnerability otherwise!), so the Linux kernel will make exec fail in the cap_file_mmap check (due to the MMAP_PAGE_ZERO personality).
So what we need is a setuid binary that will give us control back without going through exec. We found such a setuid binary that is installed on many Desktop Linux machines by default: pulseaudio. pulseaudio will drop privileges and let you specify a library to load though its -L argument. Exactly what we needed!
Once we have one page mapped in the forbidden area, it's game over. Nothing will prevent us from using mremap to grow the area and mprotect to change our access rights to PROT_READ|PROT_WRITE|PROT_EXEC. So this completely bypasses the Linux kernel's protection.
Note that apart from this problem, the mere fact that MMAP_PAGE_ZERO is not in the PER_CLEAR_ON_SETID mask and thus is allowed when executing setuid binaries can be a security issue: being able to map page zero in a process with euid=0, even without controlling its content could be useful when exploiting a null pointer vulnerability in a setuid application.
We believe that the correct fix for this issue is to add MMAP_PAGE_ZERO to the PER_CLEAR_ON_SETID mask.
PS: Thanks to Robert Swiecki for some help while investigating this.
Subscribe to:
Posts (Atom)