Thursday, January 21, 2010

CVE-2010-0232: Microsoft Windows NT #GP Trap Handler Allows Users to Switch Kernel Stack

Two days ago, Tavis Ormandy has published one of the most interesting vulnerabilities I've seen so far.

It's one of those rare, but fascinating design-level errors dealing with low-level system internals. Its exploitation requires skills and ingenuity.

The vulnerability lies in Windows' support for Intel's hardware 8086 emulation support (virtual-8086, or VM86) and is believed to have been there since Windows NT 3.1 (1993!), making it 17 years old.

It uses two tricks that we have already published on this blog before, the #GP on pre-commit handling failure and the forging of cs:eip in VM86 mode.

This was intended to be mentioned in our talk at PacSec about virtualization this past November, but Tavis had agreed with Microsoft to postpone the release of this advisory.

Tavis was kind enough to write a blog post about it, you can read it below:

From Tavis Ormandy:

I've just published one of the most interesting bugs I've ever encountered, a simple authentication check in Windows NT that can incorrectly let users take control of the system. The bug exists in code hidden deep enough inside the kernel that it's gone unnoticed for as long as NT has existed.

If you've ever tried to run an MS-DOS or Win16 application on a modern NT machine, the chances are it worked. This is an impressive feat, these applications were written for a completely different execution environment and operating system, and yet still work today and run at almost native speed.

The secret that makes this possible behind the scenes is Virtual-8086 mode. Virtual-8086 mode is a hardware emulation facility built into all x86 processors since the i386, and allows modern operating systems to run 16-bit programs designed for real mode with very little overhead. These 16-bit programs run in a simulated real mode environment within a regular protected mode task, allowing them to co-exist in a modern multitasking environment.

Support for Virtual-8086 mode requires a monitor, the collective name for the software that handles any requests the program makes. These requests range from handling sensitive instructions to mapping low-level services onto system calls and are implemented partially in kernel mode and partially in user mode.

In Windows NT, the user mode component is called the NTVDM subsystem, and it interacts with the kernel via a native system service called NtVdmControl. NtVdmControl is unusual because it's authenticated, only authorised programs are permitted to access it, which is enforced using a special process flag called VdmAllowed which the kernel verifies is present before NtVdmControl will perform any action; if you don't have this flag, the kernel will always return STATUS_ACCESS_DENIED.

The bug we're talking about today involves how BIOS service calls are handled, which are a low level way of interacting with the system that's needed to support real-mode programs. The kernel implements BIOS service calls in two stages, the second stage begins when the interrupt handler for general protection faults (often shortened to #GP in technical documents) detects that the system has completed the first stage.

The details of how BIOS service calls are implemented are unimportant, what is important is that the two stages must be perfectly synchronised, if the kernel transitions to the second stage incorrectly, a hostile user can take advantage of this confusion to take control of the kernel and compromise the system. In theory, this shouldn't be a problem, Microsoft implemented a check that verifies that the trap occurred at a magic address (actually, a cs:eip pair) that unprivileged users can't reach.

The check seems reasonable at first, the hardware guarantees that unprivileged code can't arbitrarily make itself more privileged without a special request, and even if it could, only authorised programs are permitted to use NtVdmControl() anyway.

Unfortunately, it turns out these assumptions were wrong. The problem I noticed was that although unprivileged code cannot make itself more privileged arbitrarily, Virtual-8086 mode makes testing the privilege level of code more difficult because the segment registers lose their special meaning. This is because In protected mode, the segment registers (particularly ss and cs) can be used to test privilege level, however in Virtual-8086 mode they're used to create far pointers, which allow 16-bit programs to access the 20-bit real address space.

However, I still couldn't abuse this fact because NtVdmControl() can only be accessed by authorised programs, and there's no other way to request pathological operation on Virtual-8086 mode tasks. I was able to solve this problem by invoking the real NTVDM subsystem, and then loading my own code inside it using a combination of CreateRemoteThread(), VirtualAllocEx() and WriteProcessMemory().

Finally, I needed to find a way to force the kernel to transition to the vulnerable code while my process appeared to be privileged. My solution to this was to make the kernel fault when returning to user mode from kernel mode, thus creating the appearance of a legitimate trap for the fabricated execution context that I had installed. These steps all fit together perfectly, and can be used to convince the kernel to execute my code, giving me complete control of the system.

Conclusion

Could Microsoft have avoided this issue? It's difficult to imagine how, errors like this will generally elude fuzz testing (In order to observe any problem, a fuzzer would need to guess a 46-bit magic number, as well as setup an intricate process state, not to mention the VdmAllowed flag), and any static analysis would need an incredibly accurate model of the Intel architecture.

The code itself was probably resistant to manual audit, it's remained fairly static throughout the history of NT, and is likely considered forgotten lore even inside Microsoft. In cases like this, security researchers are sometimes in a better position than those with the benefit of documentation and source code, all abstraction is stripped away and we can study what remains without being tainted by how documentation claims something is supposed to work.

If you want to mitigate future problems like this, reducing attack surface is always the key to security. In this particular case, you can use group policy to disable support for Application Compatibility (see the Application Compatability policy template) which will prevent unprivileged users from accessing NtVdmControl(), certainly a wise move if your users don't need MS-DOS or Windows 3.1 applications.