Thursday, May 28, 2009

Time-stamp counter disabling oddities in the Linux kernel

The time-stamp counter (TSC) is part of the performance monitoring facilities provided on Intel processors. It's stored in a 64-bits MSR. Except for 64-bit wraparound (and of course reset), the TSC is guaranteed to be monotonically increasing by Intel, but not necessarily at a constant rate.
Historically, the TSC increased with every internal processor clock cycle, but now the rate is usually constant (even if the processor changes frequency) and usually equals the maximum processor frequency.

There are multiple ways of reading the value of the TSC MSR, a popular one is the RDTSC instruction. This instruction will load the value into EDX:EAX and is not privileged unless the Time-Stamp Disable (TSD) bit is set in CR4. Most operating systems will not set CR4.TSD on any thread, so programmers are free to use RDTSC in their Ring3 code.

The problem is that the TSC has been used as a tool in the past to mount side channel attacks. Two examples are "Cache Attacks and Countermeasures: the Case of AES" by Osvik, Shamir and Tromer and "Cache missing for fun and profit" by Colin Percival. (Less importantly, it has also been used to create exploits against race conditions in the Linux kernel such as this one)

In an attempt to kill RDTSC as a tool to conduct various mischiefs, Andrea Arcangeli, author of the SECCOMP prctl (that allows a thread to enter a sandboxed "computing mode" where only read, write, exit and sigreturn syscalls are allowed) tried to disable RDTSC by setting CR4.TSD in any thread that runs under seccomp (in 2.6.12).

That's where the oddities begin: I was recently surprised to see that a process I ran under seccomp had actually access to rdtsc. A quick look at the source code of my kernel revealed this:

#ifdef TIF_NOTSC

Note that TIF_NOTSC is not a config option! So I took a look at both thread_info_64.h and thread_info_32.h to discover that in the 64 bits version, TIF_NOTSC was not defined. As a consequence, on a 32 bits kernel, seccomp will disable the TSC in seccomp threads but will not on a 64 bits kernel (even for 32 bits processes). Chris Evans blogged previously about how a seemingly simple security technology such as seccomp could still have bugs. "Here's another one" I thought.

While tracking this bug, I found out that it wasn't a bug but a conscious decision by Andi Kleen to not disable TSC, but only on x86_64 (patch applied in 2.6.14) for performance reasons. I consider this a really odd decision: seccomp behaving differently on 64 bits and 32 bits kernels is a non sense! If you consider TSC disabling a security feature, it has to behave consistently or you should just remove it altogether. Here is a thread, started in November 2005 by Andrea Arcangeli who also regretted the lack of consistency.

But then, in Linux 2.6.23, this feature became impact-free, performance-wise. So at that point, I really consider not having it on x86_64 kernels a bug, not only a strange decision. As I mentioned previously, the bug is due to TIF_NOTSC not being defined for 64 bits kernels.
I wondered if this bug would still be there in recent Linux kernels despite the ongoing i386 and x86_64 merge. It wasn't in 2.6.27 where thread_info_64.h and thread_info_32.h have been merged into one thread_info.h file. But in fact, it was already corrected in 2.6.26 at the same time as a new feature, prctl(PR_SET_TSC), was introduced.

PR_SET_TSC lets you control the CR4.TSD flag for your thread: you can make your thread SIGSEGV on rdtsc. And this feature is another big oddity for me: if you consider rdtsc harmful, it would make sense to let a process drop the privilege to use RDTSC, but the weird thing here is that they don't forbid you to call prctl(PR_SET_TSC) again to clear the TSD flag and restore your privilege to use rdtsc! So I can't imagine what this is for, the only use case I can see would be in a ptrace sandbox.

Another use case would have been SECCOMP of course. By removing the automatic TSC disabling from seccomp, a thread could use PR_SET_TSC prior to using PR_SETCOMP if it wanted to disable rdtsc, thus making this behavior configurable. Since a thread under seccomp cannot call prctl(), the thread wouldn't have been able to re-enable it. But the problem would have been that existing code relying on SECCOMP might be expecting to drop TSC access without having to use PR_SET_TSC. But wait! This feature had never worked in the first place, it was the perfect time to change the behavior and finally fix this bug. Another oddity!

I should also discuss the whole idea of forbidding access to the useful rdtsc instruction in the first place. Could an attacker emulate this with a thread on another processor incrementing a counter manually anyway? Are the RTC, HPET or the gtod_data counter in the vsyscall page useable? How realistic are those side channel attacks in the first place? If they are, when could it become the easiest attack you can perform on a system ? That will be for another post.


  1. Side channels are a real problem, but they can often be exploited remotely, which means that disabling TSC locally does not solve anything. For example, I released an advisory about a timing attack on Google Keyczar last week.

    Besides the RDTSC instruction, you can read the TSC value via MSR 10h. MSRs are privileged though.

  2. I agree.
    If you can already execute code on the host, RDTSC is not a flaw, it's a tool you might use to exploit flaws.

    Removing access to it would only make sense if it could not be easily replaced, which is very likely not the case: I expect real-life exploitable flaws to not need that kind of accuracy anyway.

    Still, I don't think disabling it in a sandbox is stupid: if you don't need it, it's an easier decision to take it away than to decide if you care about it. But removing a useful feature without a proven security benefit should be an option, not something mandatory.