Thursday, January 21, 2010

CVE-2010-0232: Microsoft Windows NT #GP Trap Handler Allows Users to Switch Kernel Stack

Two days ago, Tavis Ormandy has published one of the most interesting vulnerabilities I've seen so far.

It's one of those rare, but fascinating design-level errors dealing with low-level system internals. Its exploitation requires skills and ingenuity.

The vulnerability lies in Windows' support for Intel's hardware 8086 emulation support (virtual-8086, or VM86) and is believed to have been there since Windows NT 3.1 (1993!), making it 17 years old.

It uses two tricks that we have already published on this blog before, the #GP on pre-commit handling failure and the forging of cs:eip in VM86 mode.

This was intended to be mentioned in our talk at PacSec about virtualization this past November, but Tavis had agreed with Microsoft to postpone the release of this advisory.

Tavis was kind enough to write a blog post about it, you can read it below:

From Tavis Ormandy:

I've just published one of the most interesting bugs I've ever encountered, a simple authentication check in Windows NT that can incorrectly let users take control of the system. The bug exists in code hidden deep enough inside the kernel that it's gone unnoticed for as long as NT has existed.

If you've ever tried to run an MS-DOS or Win16 application on a modern NT machine, the chances are it worked. This is an impressive feat, these applications were written for a completely different execution environment and operating system, and yet still work today and run at almost native speed.

The secret that makes this possible behind the scenes is Virtual-8086 mode. Virtual-8086 mode is a hardware emulation facility built into all x86 processors since the i386, and allows modern operating systems to run 16-bit programs designed for real mode with very little overhead. These 16-bit programs run in a simulated real mode environment within a regular protected mode task, allowing them to co-exist in a modern multitasking environment.

Support for Virtual-8086 mode requires a monitor, the collective name for the software that handles any requests the program makes. These requests range from handling sensitive instructions to mapping low-level services onto system calls and are implemented partially in kernel mode and partially in user mode.

In Windows NT, the user mode component is called the NTVDM subsystem, and it interacts with the kernel via a native system service called NtVdmControl. NtVdmControl is unusual because it's authenticated, only authorised programs are permitted to access it, which is enforced using a special process flag called VdmAllowed which the kernel verifies is present before NtVdmControl will perform any action; if you don't have this flag, the kernel will always return STATUS_ACCESS_DENIED.

The bug we're talking about today involves how BIOS service calls are handled, which are a low level way of interacting with the system that's needed to support real-mode programs. The kernel implements BIOS service calls in two stages, the second stage begins when the interrupt handler for general protection faults (often shortened to #GP in technical documents) detects that the system has completed the first stage.

The details of how BIOS service calls are implemented are unimportant, what is important is that the two stages must be perfectly synchronised, if the kernel transitions to the second stage incorrectly, a hostile user can take advantage of this confusion to take control of the kernel and compromise the system. In theory, this shouldn't be a problem, Microsoft implemented a check that verifies that the trap occurred at a magic address (actually, a cs:eip pair) that unprivileged users can't reach.

The check seems reasonable at first, the hardware guarantees that unprivileged code can't arbitrarily make itself more privileged without a special request, and even if it could, only authorised programs are permitted to use NtVdmControl() anyway.

Unfortunately, it turns out these assumptions were wrong. The problem I noticed was that although unprivileged code cannot make itself more privileged arbitrarily, Virtual-8086 mode makes testing the privilege level of code more difficult because the segment registers lose their special meaning. This is because In protected mode, the segment registers (particularly ss and cs) can be used to test privilege level, however in Virtual-8086 mode they're used to create far pointers, which allow 16-bit programs to access the 20-bit real address space.

However, I still couldn't abuse this fact because NtVdmControl() can only be accessed by authorised programs, and there's no other way to request pathological operation on Virtual-8086 mode tasks. I was able to solve this problem by invoking the real NTVDM subsystem, and then loading my own code inside it using a combination of CreateRemoteThread(), VirtualAllocEx() and WriteProcessMemory().

Finally, I needed to find a way to force the kernel to transition to the vulnerable code while my process appeared to be privileged. My solution to this was to make the kernel fault when returning to user mode from kernel mode, thus creating the appearance of a legitimate trap for the fabricated execution context that I had installed. These steps all fit together perfectly, and can be used to convince the kernel to execute my code, giving me complete control of the system.

Conclusion

Could Microsoft have avoided this issue? It's difficult to imagine how, errors like this will generally elude fuzz testing (In order to observe any problem, a fuzzer would need to guess a 46-bit magic number, as well as setup an intricate process state, not to mention the VdmAllowed flag), and any static analysis would need an incredibly accurate model of the Intel architecture.

The code itself was probably resistant to manual audit, it's remained fairly static throughout the history of NT, and is likely considered forgotten lore even inside Microsoft. In cases like this, security researchers are sometimes in a better position than those with the benefit of documentation and source code, all abstraction is stripped away and we can study what remains without being tainted by how documentation claims something is supposed to work.

If you want to mitigate future problems like this, reducing attack surface is always the key to security. In this particular case, you can use group policy to disable support for Application Compatibility (see the Application Compatability policy template) which will prevent unprivileged users from accessing NtVdmControl(), certainly a wise move if your users don't need MS-DOS or Windows 3.1 applications.

Saturday, November 28, 2009

Virtualization security and the Intel privilege model

Earlier this month, Tavis and I spoke at PacSec 2009 in Tokyo about virtualisation security on Intel architectures, with a focus on CPU virtualisation.

During this talk, we briefly explained various techniques used for CPU virtualisation such as dynamic translation (QEmu), VMware-style binary translation or paravirtualisation (Xen) and we went through bugs found by us and others:

- We released some details about MS09-33 (CVE-2009-1542), a bug we found in VirtualPC's instructions decoding
- We mentioned two of the awesome bugs found by Derek Soeder in VMware, CVE-2008-4915 and CVE-2008-4279
- We explained and demo-ed the exploitation of the mishandled exception on page fault bug in VMware that I previously blogged about.
- We released information on CVE-2009-3827, a bug we discovered in Virtual PC's hardware virtualisation.
A funny fact is that the exact same bug was independently uncovered and corrected in KVM later by Avi Kivity (CVE-2009-3722). The reason may be a not perfectly clear Intel documentation about the differences between MOV_DR and MOV_CR events in hardware virtualisation.
This bug has already been addressed by Microsoft in Windows 7 and will get corrected in the next service pack for Virtual PC and Virtual Server.

If you are interested, you can download the slides here.

Friday, October 30, 2009

CVE-2009-2267: Mishandled exception on page fault in VMware

Tavis Ormandy and myself have recently released an advisory for CVE-2009-2267.

This is a vulnerability in VMware's virtual CPU which can lead to privilege escalation in a guest. All VMware virtualisation products were affected, including in hardware virtualisation mode.

In a VMware guest, in the general case, unprivileged (Ring 3) code runs without VMM intervention until an exception or interrupt occurs. An exception to this is Virtual-8086 mode (VM86) where VMware will perform CPU emulation.

When VMware was emulating a far call instruction in VM86 mode, it was using supervisory access to push the CS and IP registers. Because of this, if this operation raisee a Page Fault (#PF) exception, the resulting exception code would be invalid and would have it's user/supervisor flag incorrectly set.

This can be used to confuse a Guest kernel. Moreover, VM86 mode can be used to further confuse the guest kernel because it allows an attacker to load an arbitrary value in the code segment (CS) register.

We wrote a reliable proof of concept to elevate privileges on Linux guests. It turned out to be very easy because of the PNP BIOS recovery code.

For further details, check our advisory, VMware's advisory and the non weaponized PoC (vmware86.c, vmware86.tar.gz), including Tavis' cool CODE32 macro.

Note that VMware silently patches their products until all all of them are updated and then releases an advisory. If you have updated VMware Workstation a few month ago, you were already protected against this vulnerability.

In theory, VMware's Virtual CPU flaws could be treated like Intel or AMD errata and worked around in operating systems. In practice, since VMware's software can be updated, this is unlikely to happen. Moreover, VMware doesn't release full details that could be used to produce work arounds.

If you like virtual CPU vulnerabilities, I suggest that you have a look at Derek Soeder's awesome advisory from last year.

Wednesday, October 14, 2009

Security in Depth for Linux Software

Chris Evans and myself have presented last week at Hack In The Box Malaysia about "Security in Depth for Linux software". You can find the slides here.

The talk was focused on writing good code and sandboxing.

The writing goode code part was using vsftpd as an example, since Chris has got this right for ten years now.

In the second part, we defined sandboxing, which we also call discretionary privilege dropping, as the ability to drop privileges programmatically and without administrative authority on the machine.

We explained some of the conceptual differences between sandboxing in this sense, where the application writer chooses to make part of his code run without certain privileges, and Mandatory Access Control systems, where the application itself doesn't make the policy.

From an application writer perspective, sandboxing facilities are desirable since they will allow your code to run with lower privileges on all machines. On the other hand, MAC is desirable from a system administrator or distribution maintainer perspective as it will allow one policy to rule over many applications and to enforce certain security properties on the system.

While Linux has a fair number of MAC systems available, sandboxing options are for now very limited. There is some hope that the ftrace framework or SELinux bounded types may allow this in the future (see also Adam Langley's post on LSMSB), but this will not be widely available anytime soon.

We demonstrated different ways of overcoming those limitations on readily available Linux kernels, focusing on three designs experimented or used in vsftpd and Chromium.
  • Using ptrace(), vsftpd experiment
  • The setuid sandbox design (Julien Tinnes, Tavis Ormandy), Chromium
  • The SECCOMP sandbox design (Markus Gutschke, Adam Langley), Chromium

Wednesday, September 16, 2009

CVE-2009-2793: Iret #GP on pre-commit handling failure: the NetBSD case

A few months ago, Tavis Ormandy and myself have used the fact that iret can fail with a General Protection (#GP) exception before the processor "commits" to user-mode (switches privileges by setting CS) on multiple occasions (more on this at upcoming PacSec)

It's not necessarily obvious that an inter-privilege iret (typically from kernel mode to user mode) can fail before the privilege switch occurs. It's however the case if the restored EIP is past the code segment limits: a #GP exception will be raised while in kernel mode.

When this occurs, an exception is raised from kernel mode with a handler in kernel mode: since there is no privilege level switch, no stack switch occurs and the trap frame will not contain saved stack information.

If an operating system's kernel does not expect this to happen, it may assume a full trap frame with saved stack registers. This is what happens in NetBSD.

An interesting point in the NetBSD case is that due to the lazy handling of the non executable stack emulation, a legitimate program could trigger the bug:
  1. The legitimate program has code on the stack. For instance due to a GCC-genereated trampoline for a nested function.
  2. The stack with be marked as executable but the code segment limit will not be raised yet: on stack execution, the kernel will handle the #GP exception and raise the limit (lazy handling).
  3. A signal handler gets set to this nested function
  4. The kernel delivers a signal to the process and iret to the code on the stack, such raising #GP pre-commit.
You can read our full NetBSD related advisory here (CVE-2009-2793).

Friday, August 28, 2009

CVE-2009-2698: udp_sendmsg() vulnerability

EDIT: p0c73n1 has posted an exploit for this to milw0rm as did andi@void.at, and spender wrote "the rebel"

Tavis Ormandy and myself have recently reported CVE-2009-2698 which has been disclosed at the beginning of the week.

This flaw affects at least Linux 2.6 with a version < 2.6.19.

When we ran into this, we realized the newest kernel versions were not affected by the PoC code we had. The reason for this was that Herbert Xu had found and corrected a closely related bug. Linux distributions running on 2.6.18 and earlier kernels did not realize the security impact of this fix and did not backport it.
This is a good example on how hard it is to backport relevant fixes to maintained stable versions of the kernel.

If you look at udp_sendmsg, you will see that the rt routing table is initialized as NULL and some code paths can lead to call ip_append_data with a NULL rt. ip_append_data() obviously doesn't handle this case properly and will cause a NULL pointer dereference.

Note that this is a data NULL pointer dereference and mapping code at page zero will not lead to immediate privileged code execution for a local attacker. However, controlling the rtable structure seems to give enough control to the attacker to elevate privileges.

Since it's hard to guarantee that ip_append_data will never be called with a NULL *rtp, we believe that this function should be made more robust by using this patch.

Here's one way to trigger this vulnerability locally:

$ cat croissant.c
#include <sys/types.h>
#include <sys/socket.h>
#include <string.h>

int main(int argc, char **argv)
{
int fd = socket(PF_INET, SOCK_DGRAM, 0);
char buf[1024] = {0};
struct sockaddr to = {
.sa_family = AF_UNSPEC,
.sa_data = "TavisIsAwesome",
};

sendto(fd, buf, 1024, MSG_PROXY | MSG_MORE, &to, sizeof(to));
sendto(fd, buf, 1024, 0, &to, sizeof(to));

return 0;
}

An effective implementation of mmap_min_addr or the UDEREF feature of PaX/GrSecurity would prevent local privilege escalation through this issue.

Thursday, August 13, 2009

Linux NULL pointer dereference due to incorrect proto_ops initializations (CVE-2009-2692)

EDIT2: Here is RedHat's official mitigation recommendation
EDIT3: Brad Spengler also wrote an exploit for this and published it. The bug triggering is based on our exploit which leaked to Brad though the private vendor-sec mailing list. He implements the personality trick Tavis and I published in June to bypass mmap_min_addr and also makes use of a feature that allows any unconfined user to gain the right to map at address zero in Redhat's default SELinux policy. He wrote a reliable shellcode for this one that should work pretty much anywhere on x86 and x86_64 machines.
EDIT4: if you use Debian or Ubuntu on your machine, I have specifically updated the kernelsec Debian/Ubuntu GrSecurity packages to protect against this bug and others.
EDIT5: Zinx wrote an ARM Android root exploit
EDIT6: Ramon de Carvalho Valle wrote a PPC/PPC64/x86_64/i386 exploit

Tavis Ormandy and myself have recently found and investigated a Linux kernel vulnerability (CVE-2009-2692). It affects all 2.4 and 2.6 kernels since 2001 on all architectures. We believe this is the public vulnerability affecting the greatest number of kernel versions.

The issue lies in how Linux deals with unavailable operations for some protocols. sock_sendpage and others don't check for NULL pointers before dereferencing operations in the ops structure. Instead the kernel relies on correct initialization of those proto_ops structures with stubs (such as sock_no_sendpage) instead of NULL pointers.

At first sight, the code in af_ipx.c looks correct and seems to initialize .sendpage properly. However, due to a bug in the SOCKOPS_WRAP macro, sock_sendpage will not be initialized. This code is very fragile and there are many other protocols where proto_ops are not correctly initialized at all (vulnerable even without the bug in SOCKOPS_WRAP), see bluetooth for instance.

So it was decided that instead of patching all those protocols and continue to rely on this very fragile code, sock_sendpage would get patched to check against NULL. This was already the way sock_splice_read and others were handled.

Since it leads to the kernel executing code at NULL, the vulnerability is as trivial as it can get to exploit (edit: that's for local privilege escalation and on Intel architectures): an attacker can just put code in the first page that will get executed with kernel privileges. Our exploit took a few minutes to adapt from a previous one:

$ ./leeches
// ------------------------------------------------------
// sendpage linux local ring0
// ---------------- taviso@sdf.lonestar.org, julien@cr0.org
// leeches.c:Aug 11 2009
// GreetZ: LiquidK, lcamtuf, Spoonm, novocainated, asiraP, ScaryBeasts, spender, pipacs, stealth, jagger, redpig, Neel and all the other leeches we forgot to mention!
Enjoy some photography while at ring0 @ http://flickr.com/meder
For our webapp friends, here is an XSS executing at ring 0: javascript:alert(1);
shellcode now executing chmod("/bin/sh", 04755), welcome to ring0
Killed
$ sh
# id
uid=1000(julien) gid=1000(julien) euid=0(root)

On x86/x86_64, this issue could be mitigated by three things:
  • the recent mmap_min_addr feature. Note that this feature has known issues until at least 2.6.30.2. See also this LWN article.
  • on IA32 with PaX/GrSecurity, the KERNEXEC feature (x86 only)
  • not implementing affected protocols (a.k.a., reducing your attack surface by disabling what you don't need): PF_APPLETALK, PF_IPX, PF_IRDA, PF_X25, PF_AX25, PF_BLUETOOTH, PF_IUCV, IPPROTO_SCTP/PF_INET6, PF_PPPOX, PF_ISDN, but there may be more. (Update: See RedHat's mitigation)
This patch should be applied to fix this issue.

You can read our advisory here.

Note: this has been featured on Slashdot, OSNews, TheRegister, ZDNet and others