Draft of Microsoft Security Servicing Commitments for Windows

June 12, 2018, 3:02 pm

≫ Next: Announcing Changes to Microsoft’s Mitigation Bypass Bounty

≪ Previous: Analysis and mitigation of speculative store bypass (CVE-2018-3639)

Microsoft’s commitment to protecting customers from vulnerabilities in our products, services, and devices includes providing security updates that address these vulnerabilities when they are discovered. We understand that researchers have wanted better clarity around the security features, boundaries and mitigations which exist in Windows and the servicing commitments which come with them. We have drafted a document which better describes the criteria Microsoft Security Response Center (MSRC) uses when determining whether a reported vulnerability will be addressed through servicing, or in the next version of a product. We are sharing the draft copy with the research community and would like feedback before we make the final copy available online. We are primarily interested in feedback around our servicing policies and whether our criteria makes sense to you, the researcher.

Microsoft Security Servicing Commitments.pdf

Please send feedback to switech@microsoft.com, thank you!

↧

Announcing Changes to Microsoft’s Mitigation Bypass Bounty

June 21, 2018, 10:36 am

≫ Next: Analysis and mitigation of L1 Terminal Fault (L1TF)

≪ Previous: Draft of Microsoft Security Servicing Commitments for Windows

Today we’re announcing a change to the Mitigation Bypass Bounty that removes Control Flow Guard (CFG) from the set of in-scope mitigations. In this blog, we’ll provide additional background and explain why we’re making this change.

Mitigation Bypass Bounty Background

Microsoft started the Mitigation Bypass Bounty in 2013 with the goal of helping us improve key defense-in-depth mitigation technologies by learning about bypasses. Since launching this program, we’ve awarded more than $1,000,000 in bounties and fixed numerous bypasses reported in our exploit mitigations and are looking forward to growing that number in the future.

One of the challenges we’ve faced with the Mitigation Bypass bounty program is providing clear guidance to researchers on what sorts of issues are in-scope vs. out-of-scope and what sort of cash reward can be expected. We’ve made several changes over the past few years to try to improve the situation here, such as:

More clearly defining payout tiers for different types of mitigation bypasses (i.e. bugs vs. design problems).
Being more transparent about the types of issues we are currently aware of so researchers know what types of bypasses are out of scope.

Even with these changes, we know we’re not perfect and we continue to listen to feedback and make changes to be more researcher friendly.

Impact of Exploit Mitigations on Exploitation

One datapoint monitored by Microsoft is the occurrences of vulnerabilities being exploited in the wild. Microsoft has seen the amount of vulnerabilities exploited in the wild decrease steadily over the past 8 years.

We believe that part of the reason for the decline of known exploits in the wild is the increase in exploitation difficulty, which transitively affects the economics of vulnerability exploitation. We attribute a large part of the increased difficulty to Microsoft’s continued investment in exploit mitigation technologies such as CFG, Arbitrary Code Guard (ACG), Code Integrity Guard (CIG), MemGC, and so on.

Before we launched the Mitigation Bypass Bounty, we were more heavily reliant on analyzing exploits found in the wild to identify mitigation opportunities. This created lag between technique use in the wild and mitigation availability. To shorten this lag time, we launched the Mitigation Bypass Bounty to proactively learn about bypasses before they were used in the wild.

CFG has been a particularly popular Mitigation Bypass Bounty target for security researchers. Thanks to this research, we’ve learned a lot about a variety of bugs and design limitations affecting CFG. This has caused us to reevaluate the threat model that we need to defend against for more robust CFI. In order to do that, we know we will need to extend and improve the design of CFG, e.g. with finer-grained CFI, read-only memory protection, safe unwind/exception handling, and so on. We recently talked about the challenges with CFG and how our threat model has evolved (Video | Slides).

Microsoft has also received submissions and made fixes for other targets in the Mitigation Bypass Bounty, such as ACG. Researchers can expect that as we build new mitigations we will add them as bounty targets.

Changes to the Mitigation Bypass Bounty Scope

As of today, CFG has been removed from the set of in-scope mitigations for the Mitigation Bypass Bounty. We believe we now have a good understanding of the limitations of CFG and the threat model we need to adapt the design to. We do not believe that additional research into CFG bypasses will be valuable until we’ve addressed these limitations and we would rather that researchers focus their attention on the other in-scope mitigations for the bounty. Although we are removing CFG from the bounty scope, we have no intention to remove or deprecate the feature and we still believe it is a valuable defense-in-depth mitigation. We look forward to bringing it back in scope once we’ve made improvements to CFG.

As always, we’d appreciate feedback from the community on this or any related topics.

Joe Bialek

MSRC Vulnerabilities & Mitigations Team

↧

Analysis and mitigation of L1 Terminal Fault (L1TF)

August 13, 2018, 6:24 pm

≫ Next: Vulnerability hunting with Semmle QL, part 1

≪ Previous: Announcing Changes to Microsoft’s Mitigation Bypass Bounty

In January 2018, Microsoft released an advisory and security updates for a new class of hardware vulnerabilities involving speculative execution side channels (known as Spectre and Meltdown). In this blog post, we will provide a technical analysis of a new speculative execution side channel vulnerability known as L1 Terminal Fault (L1TF) which has been assigned CVE-2018-3615 (for SGX), CVE-2018-3620 (for operating systems and SMM), and CVE-2018-3646 (for virtualization). This vulnerability affects Intel Core processors and Intel Xeon processors.

This post is primarily geared toward security researchers and engineers who are interested in a technical analysis of L1TF and the mitigations that are relevant to it. If you are interested in more general guidance, please refer to Microsoft's security advisory for L1TF.

Please note that the information in this post is current as of the date of this post.

L1 Terminal Fault (L1TF) overview

We previously defined four categories of speculation primitives that can be used to create the conditions for speculative execution side channels. Each category provides a fundamental method for entering speculative execution along a non-architectural path, specifically: conditional branch misprediction, indirect branch misprediction, exception delivery or deferral, and memory access misprediction. L1TF belongs to the exception delivery or deferral category of speculation primitives (along with Meltdown and Lazy FP State Restore) as it deals with speculative (or out-of-order) execution related to logic that generates an architectural exception. In this post, we’ll provide a general summary of L1TF. For a more in-depth analysis, please refer to the advisory and whitepaper that Intel has published for this vulnerability.

L1TF arises due to a CPU optimization related to the handling of address translations when performing a page table walk. When translating a linear address, the CPU may encounter a terminal page fault which occurs when the paging structure entry for a virtual address is not present (Present bit is 0) or otherwise invalid. This will result in an exception, such as a page fault, or TSX transaction abort along the architectural path. However, before either of these occur, a CPU that is vulnerable to L1TF may initiate a read from the L1 data cache for the linear address being translated. For this speculative-only read, the page frame bits of the terminal (not present) page table entry are treated as a system physical address, even for guest page table entries. If the cache line for the physical address is present in the L1 data cache, then the data for that line may be forwarded on to dependent operations that may execute speculatively before retirement of the instruction that led to the terminal page fault. The behavior related to L1TF can occur for page table walks involving both conventional and extended page tables (the latter of which is used for virtualization).

To illustrate how this might occur, it may help to consider the following simplified example. In this example, an attacker-controlled virtual machine (VM) has constructed a page table hierarchy within the VM with the goal of reading a desired system (host) physical address. The following diagram provides an example hierarchy for the virtual address 0x12345000 where the terminal page table entry is not present but contains a page frame of 0x9a0 as shown below:

After setting up this hierarchy, the VM could then attempt to read from system physical addresses within [0x9a0000, 0x9a1000) through an instruction sequence such as the following:

01: 4C0FB600          movzx r8,byte [rax] ; rax = 0x12345040
02: 49C1E00C          shl r8,byte 0xc
03: 428B0402          mov eax,[rdx+r8]    ; rdx = address of signal array

By executing these instructions within a TSX transaction or by handling the architectural page fault, the VM could attempt to induce a speculative load from the L1 data cache line associated with the system physical address 0x9a0040 (if present in the L1) and have the first byte of that cache line forwarded to an out-of-order load that uses this byte as an offset into a signal array. This would create the conditions for observing the byte value using a disclosure primitive such as FLUSH+RELOAD, thereby leading to the disclosure of information across a security boundary in the case where this system physical address has not been allocated to the VM.

While the scenario described above illustrates how L1TF can apply to inferring physical memory across a virtual machine boundary (where the VM has full control of the guest page tables), it is also possible for L1TF to be exploited in other scenarios. For example, a user mode application could attempt to use L1TF to read from physical addresses referred to by not present terminal page table entries within their own address space. In practice, it is common for operating systems to make use of the software bits in the not present page table entry format for storing metadata which could equate to valid physical page frames. This could allow a process to read physical memory not assigned to the process (or VM, in a virtualized scenario) or that is not intended to be accessible within the process (e.g. PAGE_NOACCESS memory on Windows).

Mitigations for L1 Terminal Fault (L1TF)

There are multiple mitigations for L1TF and they vary based on the attack category being mitigated. To illustrate this, we’ll describe the software security models that are at risk for L1TF and the specific tactics that can be employed to mitigate it. We’ll reuse the mitigation taxonomy from our previous post on mitigating speculative execution side channels for this. In many cases, the mitigations described in this section need to be combined in order to provide a broad defense for L1TF.

Relevance to software security models

The following table summarizes the potential relevance of L1TF to various intra-device attack scenarios that software security models are typically concerned with. Unlike Meltdown (CVE-2017-5754) which only affected the kernel-to-user scenario, L1TF is applicable to all intra-device attack scenarios as indicated by the orange cells (gray cells would have indicated not applicable). This is because L1TF can potentially provide the ability to read arbitrary system physical memory.

Attack Category	Attack Scenario	L1TF
Inter-VM	Hypervisor-to-guest	CVE-2018-3646
	Host-to-guest	CVE-2018-3646
	Guest-to-guest	CVE-2018-3646
Intra-OS	Kernel-to-user	CVE-2018-3620
	Process-to-process	CVE-2018-3620
	Intra-process	CVE-2018-3620
Enclave	Enclave-to-any	CVE-2018-3615
Enclave	VSM-to-any	CVE-2018-3646

Preventing speculation techniques involving L1TF

As we’ve noted in the past, one of the best ways to mitigate a vulnerability is by addressing the issue as close to the root cause as possible. In the case of L1TF, there are multiple mitigations that can be used to prevent speculation techniques involving L1TF.

Safe page frame bits in not present page table entries

One of the requirements for an attack involving L1TF is that the page frame bits of a terminal page table entry must refer to a valid physical page that contains sensitive data from another security domain. This means a compliant hypervisor and operating system kernel can mitigate certain attack scenarios for L1TF by ensuring that either 1) the physical page referred to by the page frame bits of not present page table entries always contain benign data and/or 2) a high order bit is set in the page frame bits that does not correspond to accessible physical memory. In the case of #2, the Windows kernel will use a bit that is less than the implemented physical address bits supported by a given processor in order to avoid physical address truncation (e.g. dropping the high order bit).

Beginning with the August, 2018 Windows security updates, all supported versions of the Windows kernel and the Hyper-V hypervisor ensure that #1 and #2 are automatically enforced on hardware that is vulnerable to L1TF. This is enforced both for conventional page table entries and extended page table entries that are not present. On Windows Server, this mitigation is disabled by default and must be enabled as described in our published guidance for Windows Server.

To illustrate how this works, consider the following example of a user mode virtual address that is not accessible and therefore has a not present PTE. In this example, the page frame bits still refer to what could be interpreted as a valid physical address in conjunction with L1TF:

26: kd> !pte 0x00000281`d84c0000
…   PTE at FFFFB30140EC2600
…   contains 0000000356CDEB00
…   not valid
…    Transition: 356cde
…    Protect: 18 - No Access

26: kd> dt nt!HARDWARE_PTE FFFFB30140EC2600
+0x000 Valid : 0y0
+0x000 Write : 0y0
+0x000 Owner : 0y0
+0x000 WriteThrough : 0y0
+0x000 CacheDisable : 0y0
+0x000 Accessed : 0y0
+0x000 Dirty : 0y0
+0x000 LargePage : 0y0
+0x000 Global : 0y1
+0x000 CopyOnWrite : 0y1
+0x000 Prototype : 0y0
+0x000 reserved0 : 0y1
+0x000 PageFrameNumber : 0y000000000000001101010110110011011110 (0x356cde)
+0x000 reserved1 : 0y0000
+0x000 SoftwareWsIndex : 0y00000000000 (0)
+0x000 NoExecute : 0y0

With the August, 2018 Windows security updates applied, it’s possible to observe the behavior of setting a high order bit in the not present page table entry that refers to physical memory that is either inaccessible or guaranteed to be benign (in this case bit 45). Since this does not correspond to an accessible physical address, any attempt to read from it using L1TF will fail.

17: kd> !pte  0x00000196`04840000
…   PTE at FFFF8000CB024200
…   contains 0000200129CB2B00
…   not valid
…    Transition: 200129cb2
…    Protect: 18 - No Access

17: kd> dt nt!HARDWARE_PTE FFFF8000CB024200
+0x000 Valid            : 0y0
+0x000 Write            : 0y0
+0x000 Owner            : 0y0
+0x000 WriteThrough     : 0y0
+0x000 CacheDisable     : 0y0
+0x000 Accessed         : 0y0
+0x000 Dirty            : 0y0
+0x000 LargePage        : 0y0
+0x000 Global           : 0y1
+0x000 CopyOnWrite      : 0y1
+0x000 Prototype        : 0y0
+0x000 reserved0        : 0y1
+0x000 PageFrameNumber  : 0y001000000000000100101001110010110010 (0x200129cb2)
+0x000 reserved1        : 0y0000
+0x000 SoftwareWsIndex  : 0y00000000000 (0)
+0x000 NoExecute        : 0y0

In order to provide a portable method of allowing VMs to determine the implemented physical address bits supported on a system, the Hyper-V hypervisor Top-Level Functional Specification (TLFS) has been revised with a defined interface that can be used by a VM to query this information. This facilitates safe migration of virtual machines within a migration pool.

Flush L1 data cache on security domain transition

Disclosing information through the use of L1TF requires sensitive data from a victim security domain to be present in the L1 data cache (note, the L1D is shared by all LPs on the same physical core). This means disclosure can be prevented by flushing the L1 data cache when transitioning between security domains. To facilitate this, Intel has provided new capabilities through a microcode update that supports an architectural interface for flushing the L1 data cache.

Beginning with the August, 2018 Windows security updates, the Hyper-V hypervisor now uses the new L1 data cache flush feature when present to ensure that VM data is removed from the L1 data cache at critical points. On Windows Server 2016+ and Windows 10 1607+, the flush occurs when switching virtual processor contexts between VMs. This helps reduce the performance impact of the flush by minimizing the number of times this needs to occur. On previous versions of Windows, the flush occurs prior to executing a VM (e.g. prior to VMENTRY).

For L1 data cache flushing in the Hyper-V hypervisor to be robust, the flush is performed in combination with safe use or disablement of HyperThreading and per-virtual-processor hypervisor address spaces.

For SGX enclave scenarios, the microcode update provided by Intel ensures that the L1 data cache is flushed any time the logical processor exits enclave execution mode. The microcode update also supports attestation of whether HT has been enabled by the BIOS. When HT is enabled, there is a possibility of L1TF attacks from a sibling logical processor before enclave secrets in L1 data cache are flushed or cleared. The entity verifying the attestation may reject attestations from a HT-enabled system if it deems the risk of L1TF attacks from the sibling logic processor to not be acceptable.

Safe scheduling of sibling logical processors

Intel’s HyperThreading (HT) technology, also known as simultaneous multithreading (SMT), allows multiple logical processors (LPs) to execute simultaneously on a physical core. Each sibling LP can be simultaneously executing code in different security domains and privilege modes. For example, one LP could be executing in the hypervisor while another is executing code within a VM. This has implications for the L1 data cache flush because it may be possible for sensitive data to reenter the L1 data cache via a sibling LP after the L1 data cache flush occurs.

In order to prevent this from happening, the execution of code on sibling LPs must be safely scheduled or HT must be disabled. Both of these approaches ensure that the L1 data cache for a core does not become polluted with data from another security domain after a flush occurs.

The Hyper-V hypervisor on Windows Server 2016 and above supports a feature known as the core scheduler which ensures that virtual processors executing on a physical core always belong to the same VM and are described to the VM as sibling hyperthreads. This feature requires administrator opt-in for Windows Server 2016 and is enabled by default starting with Windows Server 2019. This, in combination with per-virtual-processor hypervisor address spaces, is what makes it possible to defer the L1 data cache flush to the point at which a core begins executing a virtual processor from a different VM rather than needing to perform the flush on every VMENTRY. For more details on how this is implemented in Hyper-V, please refer to the in-depth Hyper-V technical blog on this topic.

The following diagram illustrates the differences in virtual processor scheduling policies for a scenario with two different VMs (VM 1 and VM 2). As the diagram shows, without core scheduling enabled it is possible for code from two different VMs to execute simultaneously on a core (in this case core 2), whereas this is not possible with core scheduling enabled.

On versions of Windows prior to Windows Server 2016 and for all versions of Windows Client with virtualization enabled, the core scheduler feature is not supported and it may therefore be necessary to disable HT in order to ensure the robustness of the L1 data cache flush for inter-VM isolation. This is also currently necessary on Windows Server 2016+ for scenarios that make use of Virtual Secure Mode (VSM) for isolation of secrets. When HT is disabled, it becomes impossible for sibling logical processors to execute simultaneously on the same physical core. For guidance on how to disable HT on Windows, please refer to our advisory.

Removing sensitive content from memory

Another tactic for mitigating speculative execution side channels is to remove sensitive content from the address space such that it cannot be disclosed through speculative execution.

Per-virtual-processor address spaces

Until the emergence of speculative execution side channels, there was not a strong need for hypervisors to partition their virtual address space on a per-VM basis. As a result, it has been common practice for hypervisors to maintain a virtual mapping of all physical memory to simplify memory accesses. The existence of L1TF and other speculative execution side channels has made it desirable to eliminate cross-VM secrets from the virtual address space of the hypervisor when it is acting on behalf of a VM.

Beginning with the August, 2018 security update, the Hyper-V hypervisor in Windows Server 2016+ and Windows 10 1607+ now uses per-virtual-processor (and hence per-VM) address spaces and also no longer maps all of physical memory into the virtual address space of the hypervisor. This ensures that only memory that is allocated to the VM and the hypervisor on behalf of the VM is potentially accessible during speculation for a given virtual processor. In the case of L1TF, this mitigation works in combination with the L1 data cache flush and safe use or disablement of HT to ensure that no sensitive cross-VM information becomes available in the L1.

Mitigation applicability

The mitigations that were described in the previous sections work in combination to provide broad protection for L1TF. The following tables provide a summary of the attack scenarios and the relevant mitigations and default settings for different versions of Windows Server and Windows Client:

Attack Category	Windows Server version		Windows Client version
	Windows Server 2016+	Pre-Windows Server 2016	Windows 10 1607+	Pre-Windows 10 1607
Inter-VM	Enabled: per-virtual-processor address spaces, safe page frame bits Opt-in: L1 data cache flush, enable core scheduler or disable HT	Enabled: safe page frame bits Opt-in: L1 data cache flush, disable HT	Enabled: per-virtual-processor address spaces, safe page frame bits Opt-in: L1 data cache flush, disable HT	Enabled: safe page frame bits Opt-in: L1 data cache flush, disable HT
Intra-OS	Opt-in: safe page frame bits		Enabled: safe page frame bits
Enclave	Enabled (SGX): L1 data cache flush Opt-in (SGX/VSM): disable HT

More concisely, the relationship between attack scenarios and mitigations for L1TF is summarized below:

Mitigation Tactic	Mitigation Name	Inter-VM	Intra-OS	Enclave
Prevent speculation techniques	Flush L1 data cache on security domain transition
	Safe scheduling of sibling logical processors
	Safe page frame bits in not present page table entries
Remove sensitive content from memory	Per-virtual-processor address spaces

Wrapping up

In this post, we analyzed a new speculative execution side channel vulnerability known as L1 Terminal Fault (L1TF). This vulnerability affects a broad range of attack scenarios and the relevant mitigations require a combination of software and firmware (microcode) updates for systems with affected Intel processors. The discovery of L1TF demonstrates that research into speculative execution side channels is ongoing and we will continue to evolve our response and mitigation strategy accordingly. We continue to encourage researchers to report new discoveries through our Speculative Execution Side Channel bounty program.

Matt Miller
Microsoft Security Response Center (MSRC)

↧

Vulnerability hunting with Semmle QL, part 1

August 16, 2018, 12:30 pm

≫ Next: Microsoft Security Servicing Criteria for Windows

≪ Previous: Analysis and mitigation of L1 Terminal Fault (L1TF)

Previously on this blog, we’ve talked about how MSRC automates the root cause analysis of vulnerabilities reported and found. After doing this, our next step is variant analysis: finding and investigating any variants of the vulnerability. It’s important that we find all such variants and patch them simultaneously, otherwise we bear the risk of these being exploited in the wild. In this post, I’d like to explain the automation we use in variant finding.

For the past year or so, we’ve been augmenting our manual code review processes with Semmle, a third-party static analysis environment. It compiles code to a relational database (the snapshot database – a combination of database and source code), which is queried using Semmle QL, a declarative, object-oriented query language designed for program analysis.

The basic workflow is that, after root cause analysis, we write queries to find code patterns that are semantically similar to the original vulnerability. Any results are triaged as usual and provided to our engineering teams for a fix to be developed. Also, the queries are placed in a central repository to be re-run periodically by MSRC and other security teams. This way, we can scale our variant finding over time and across multiple codebases.

In addition to variant analysis, we’ve been using QL proactively, in our security reviews of source code. This will be the topic of a future blog post. For now, let’s look at some real-world examples inspired by MSRC cases.

Incorrect integer overflow checks

This first case is a bug that’s straightforward to define, but would be tedious to find variants of in a large codebase.

The code below shows a common idiom for detecting overflow on the addition of unsigned integers:

if (x + y < x) { // handle integer overflow }

Unfortunately, this does not work properly when the width of the integer type is small enough to be subject to integral promotion. For example, if x and y were both unsigned short, when compiled, the above code would end up being equivalent to (unsigned int)x + y < x, making this overflow check ineffective.

Here’s a QL query that matches this code pattern:

import cpp from AddExpr a, Variable v, RelationalOperation r where a.getAnOperand() = v.getAnAccess() and r.getAnOperand() = v.getAnAccess() and r.getAnOperand() = a and forall(Expr op | op = a.getAnOperand() | op.getType().getSize() < 4) and not a.getExplicitlyConverted().getType().getSize() < 4 select r, "Useless overflow check due to integral promotion"

In the from clause, we define the variables, and their types, to be used in the rest of the query. AddExpr, Variable, and RelationalOperation are QL classes representing various sets of entities in the snapshot database, e.g. RelationalOperation covers every expression with a relational operation (less than, greater than, etc.)

The where clause is the meat of the query, using logical connectives and quantifiers to define how to relate the variables. Breaking it down, this means that the addition expression and the relational operation need the same variable as one of their operands (x in the example code above):

a.getAnOperand() = v.getAnAccess() and r.getAnOperand() = v.getAnAccess()

The other operand of the relational operation must be the addition:

r.getAnOperand() = a

Both operands of the addition must have a width less than 32 bits (4 bytes):

forall(Expr op | op = a.getAnOperand() | op.getType().getSize() < 4)

But if there is an explicit cast on the addition expression, we’re not interested if it’s less than 32 bits:

not a.getExplicitlyConverted().getType().getSize() < 4

(This is so a check like (unsigned short)(x + y) < x doesn’t get flagged by the query.)

Finally, the select clause defines the result set.

This vulnerability was originally reported in Chakra (the JavaScript engine of Edge), where the consequence of that particular ineffective overflow check was memory corruption. The query matched the original vulnerability but no additional variants in Chakra. However, we discovered several when applying this exact query to other Windows components.

Unsafe use of SafeInt

An alternative to rolling your own integer overflow checks is to use a library with such checks built in. SafeInt is a C++ template class that overrides arithmetic operators to throw an exception where overflow is detected.

Here’s an example of how to use it correctly:

int x, y, z; // ... z = SafeInt<int>(x) + y;

But here is an example of how it can be unintentionally misused – the expression passed to the constructor may already have overflowed:

int x, y, z; // ... z = SafeInt<int>(x + y);

How to write a query to detect this? In the previous example, our query only used built-in QL classes. For this one, let’s start by defining our own. For this, we choose one or more QL classes to subclass from (with extends), and define a characteristic predicate which specifies those entities in the snapshot database that are matched by the class:

class SafeInt extends Type { SafeInt() { this.getUnspecifiedType().getName().matches("SafeInt<%") } }

The QL class Type represents the set of all types in the snapshot database. For the QL class SafeInt, we subset this to just types with a name that begins with “SafeInt<”, thus indicating that they are instantiations of the SafeInt template class. The getUnspecifiedType() predicate is used to disregard typedefs and type specifiers such as const.

Next, we define the subset of expressions that could potentially overflow. Most arithmetic operations can overflow, but not all; this uses QL’s instanceof operator to define which ones. We use a recursive definition because we need expressions such as (x+1)/y to be included, even though x/y is not.

class PotentialOverflow extends Expr { PotentialOverflow() { (this instanceof BinaryArithmeticOperation // match x+y x-y x*y and not this instanceof DivExpr // but not x/y and not this instanceof RemExpr) // or x%y or (this instanceof UnaryArithmeticOperation // match x++ x-- ++x --x -x and not this instanceof UnaryPlusExpr) // but not +x // recursive definitions to capture potential overflow in // operands of the operations excluded above or this.(BinaryArithmeticOperation).getAnOperand() instanceof PotentialOverflow or this.(UnaryPlusExpr).getOperand() instanceof PotentialOverflow } }

Finally, we relate these two classes in a query:

from PotentialOverflow po, SafeInt si where po.getParent().(Call).getTarget().(Constructor).getDeclaringType() = si select po, po + " may overflow before being converted to " + si

.(Call) and .(Constructor) are examples of an inline cast, which, similar to instanceof, is another way of restricting which QL classes match. In this case we are saying that, given an expression that may overflow, we’re only interested if its parent expression is some sort of call. Furthermore, we only want to know if the target of that call is a constructor, and if it’s a constructor for some SafeInt.

Like the previous example, this was a query that provided a number of actionable results across multiple codebases.

JavaScript re-entrancy to use-after-free

This next example was a vulnerability in Edge caused by re-entrancy into JavaScript code.

Edge defines many functions that can be called from JavaScript. This model function illustrates the essence of the vulnerability:

HRESULT SomeClass::vulnerableFunction(Var* args, UINT argCount, Var* retVal) { // get first argument - // from Chakra, acquire pointer to array BYTE* pBuffer; UINT bufferSize; hr = Jscript::GetTypedArrayBuffer(args[1], &pBuffer, &bufferSize); // get second argument – // from Chakra, obtain an integer value int someValue; hr = Jscript::VarToInt(args[2], &someValue); // perform some operation on the array acquired previously doSomething(pBuffer, bufferSize); …

The problem was that when Edge calls back into Chakra, e.g. during VarToInt, arbitrary JavaScript code may be executed. The above function could be exploited by passing it a JavaScript object that overrides valueOf to free the buffer, so when VarToInt returns, pBuffer is a dangling pointer:

var buf = new ArrayBuffer(length); var arr = new Uint8Array(buf); var param = {} param.valueOf = function() { /* free `buf` (code to actually do this would be defined elsewhere) */ neuter(buf); // neuter `buf` by e.g. posting it to a web worker gc(); // trigger garbage collection return 0; }; vulnerableFunction(arr, param);

The specific pattern we’re looking for with QL is therefore: acquisition of a pointer from GetTypedArrayBuffer, followed by a call to some Chakra function that may execute JavaScript, followed by some use of the pointer.

For the array buffer pointer, we match on the calls to GetTypedArrayBuffer, where the second argument (getArgument of Call is zero-indexed) is an address-of expression (&), and take its operand:

class TypedArrayBufferPointer extends Expr { TypedArrayBufferPointer() { exists(Call c | c.getTarget().getName() = "GetTypedArrayBuffer" and this = c.getArgument(1).(AddressOfExpr).getOperand()) } }

The exists logical quantifier is used here to introduce a new variable (c) into the scope.

There are several Chakra API functions that could be used for JavaScript re-entrancy. Rather than defining them by name, we can identify the internal Chakra function that performs this task, and use QL to figure this out from the call graph:

// examine call graph to match any function that may eventually call MethodCallToPrimitive predicate mayExecJsFunction(Function f) { exists(Function g | f.calls+(g) and g.hasName("MethodCallToPrimitive") } // this defines a call to any of the above functions class MayExecJsCall extends FunctionCall { MayExecJsCall() { mayExecJsFunction(this) } }

The “+” suffix of the calls predicate specifies a transitive closure – it applies the predicate to itself until there is a match. This permits a concisely defined exploration of the function call graph.

Finally, this query brings these QL class definitions together, relating them by control flow:

from TypedArrayBufferPointer def, MayExecJsCall call, VariableAccess use, Variable v where v.getAnAccess() = def and v.getAnAccess() = use and def.getASuccessor+() = call and call.getASuccessor+() = use select use, "Call to " + call + " between definition " + def + " and use " + use

The predicate getASuccessor() specifies the next statement or expression in the control flow. Therefore, using e.g. call.getASuccessor+() = use will follow the control flow graph from call until there is a match to use. The diagram below illustrates this:

This query uncovered four variants in addition to the originally reported vulnerability, all of them assessed as critical severity.

That’s all for now. The next instalment will cover using QL for data flow analysis and taint tracking, with examples from our security review of an Azure firmware component.

Steven Hunter, MSRC Vulnerabilities & Mitigations team

↧

Microsoft Security Servicing Criteria for Windows

September 10, 2018, 2:55 am

≫ Next: First Steps in Hyper-V Research

≪ Previous: Vulnerability hunting with Semmle QL, part 1

One of our goals in the Microsoft Security Response Center (MSRC) is to be more transparent with security researchers and our customers on the criteria we use for determining when we intend to address a reported vulnerability through a security update. Our belief is that improving transparency on this topic helps provide clarity on how we assess risk, sets expectations for the types of vulnerabilities that we intend to service, and facilitates constructive dialogue as the threat landscape evolves over time. Ultimately, we believe this enables us all to work together to better protect Microsoft’s customers.

Toward this end, we released a draft version of the security servicing criteria for Windows in June, 2018. We received some great feedback from the research community and the broader security industry that we used to improve the clarity of this criteria. Today, we are happy to announce the publication of the first version of the security servicing criteria for Windows. We expect this to be a living document that evolves over time and we look forward to continuing the dialogue with the community on this topic.

Microsoft Security Servicing Criteria for Windows
Microsoft Vulnerability Severity Classification for Windows

Please reach out to us at switech@microsoft.com or @msftsecresponse on twitter to continue the discussion.

We’d like to acknowledge all of our partner teams from across Microsoft who helped to create and improve the clarity of this criteria.

Nate Warfield - Microsoft Security Response Center (MSRC)

↧

First Steps in Hyper-V Research

December 10, 2018, 10:08 am

≫ Next: Fuzzing para-virtualized devices in Hyper-V

≪ Previous: Microsoft Security Servicing Criteria for Windows

Microsoft has put a lot of effort in Hyper-V security. Hyper-V, and the whole virtualization stack, runs at the core of many of our products: cloud computing, Windows Defender Application Guard, and technology built on top of Virtualization Based Security (VBS). Because Hyper-V is critical to so much of what we do, we want to encourage researchers to study it and tell us about the vulnerabilities they find: we even offer a $250K bounty for those who do. In this blog series, we’ll explore what’s needed to start looking under the hood and hunt for Hyper-V vulnerabilities.

One possible approach to find bugs and vulnerabilities in the virtualization platforms is static analysis. We recently released public symbols for the storage components, which with the previous symbols that were already available, means that most symbols of the virtualization stack are now publicly available. This makes static analysis much easier.

In addition to static analysis, when auditing code, another useful tool is live debugging. It helps us to inspect code paths and data as they are in actual runtime. It also makes our lives easier when trying to trigger interesting code paths, breaking on a piece of code, or when we need to inspect the memory layout or registers at a certain point. While this practice is widely used when researching the Windows kernel or userspace, debugging the hypervisor and virtualization stack is sparsely documented. In this blog post, we will change this.

Brief Intro to the Virtualization Stack

When we talk about “partitions”, we mean different VMs running on top of the hypervisor. We differentiate between three types of partitions: root partition (also known as a parent partition), enlightened guest partitions and unenlightened guest partitions. Unlike the other guest VMs, the “root partition” is our host OS. While it is a fully-fledged Windows VM, where we can run regular programs like a web browser, parts of the virtualization stack itself runs in the root partition kernel and userspace. We can think about the root partition as a special, privileged guest VM, that works together with the hypervisor.

For example, the Hyper-V Management Services runs in the root partition and can create new guest partitions. These will in turn communicate with the hypervisor kernel using hypercalls. There are higher level APIs for communication, with the most important one being VMBus. VMBus is a cross-partition IPC component that is heavily used by the virtualization stack.

Virtual Devices for the guest partitions can also take advantage of a feature named Enlightened I/O, for storage, networking, graphics, and input subsystems. Enlightened I/O is a specialized virtualization-aware implementation of communication protocols (SCSI, for example), which runs on top of VMBus. This bypasses the device emulation layer which gives better performance and features since the guest is already aware of the virtualization stack (hence “enlightened”). However, this requires special drivers for the guest VM that make use of this capability.

Hyper-V enlightened I/O and a hypervisor aware kernel are available when installing the Hyper-V integration services. Integration components, which include virtual server client (VSC) drivers, are available for Windows and for some of the more common Linux distributions.

When creating guest partitions in Hyper-V, we can choose a “Generation 1” or a “Generation 2” guest. We will not go into the differences between the two types of VMs here, but it is worth mentioning that a Generation 1 VM supports most guest operating systems, while a Generation 2 VM supports a more limited set of operating systems (mostly 64-bit). In this post, we will use only Generation 1 VMs. Detailed explanation about the differences is available here.

To recap, a usual virtualization setup will look like this:

Root partition	Windows
Enlightened child partitions	Windows or Linux
Unenlightened child partitions	Other OSs like FreeBSD, or legacy Linux

That’s everything you need to know for this blog post. For or more information, see this presentation or TLFS.

Debugging Environment

The setup we’re going to talk about is a debugging environment for both the hypervisor and the root partition kernel of a nested Windows 10 Hyper-V guest. We can’t debug the kernel and the hypervisor of our own host, for obvious reasons. For that, we’ll need to create a guest, enable Hyper-V inside it, and configure everything so it will be ready for debugging connection. Fortunately, Hyper-V supports nested virtualization, which we are going to use here. The debugging environment will look like this:

Since we want to debug a hypervisor, but not the one running the VM that we debug, we will use nested virtualization. We will run the hypervisor we want to debug as a guest of another hypervisor.

To clarify, let's introduce some basic nested virtualization terminology:

L0 = Code that runs on a physical host. Runs a hypervisor.

L1 = L0's hypervisor guest. Runs the hypervisor we want to debug.

L2 = L1's hypervisor guest.

In short, we will debug an L1 hypervisor and root partition’s kernel from the L0 root partition’s userspace.

More about working with nested virtualization over Hyper-V can be found here.

There’s debug support built into the hypervisor, which lets us connect to Hyper-V with a debugger. To enable it we’ll need to configure a few settings in BCD (Boot Configuration Data) variables.

Let’s start with setting up the VM we’re going to debug:

If Hyper-V is not running already on your host:
1. Enable Hyper-V, as documented here.
2. Reboot the host.
Set up a new guest VM for debugging:

Create a Generation 1 Windows 10 guest, as documented here. While this process would also work for Generation 2 guests, you will have to disable Secure Boot for this to work.
Enable VT-x in the guest’s processor. Without this, the Hyper-V Platform will fail to run inside the guest. Note that we can do it only when the guest is powered off. We can do that from an elevated PowerShell prompt on the host:
```
Set-VMProcessor -VMName <VMName> -ExposeVirtualizationExtensions $true
```
Since we’ll debug the hypervisor inside the guest, we need to enable Hyper-V inside the guest too (documented here). Reboot the guest VM afterwards.
Now we need to set the BCD variables to enable debugging. Inside the L1 host (which is the inner OS we just set up), run the following from an elevated cmd.exe:
```
bcdedit /hypervisorsettings serial DEBUGPORT:1 BAUDRATE:115200
bcdedit /set hypervisordebug on
bcdedit /set hypervisorlaunchtype auto
bcdedit /dbgsettings serial DEBUGPORT:2 BAUDRATE:115200
bcdedit /debug on
```
This will:
- Enable Hyper-V debugging through serial port
- Enable Hyper-V to launch on boot, and load the kernel
- Enable kernel debug through a different serial port
These changes will take affect after the next boot.

To communicate with these serial ports, we need to expose them to the L0 host, which is the OS we will work from. We can’t set hardware configuration when the guest is up, so:
1. Shut down the guest.
2. From Hyper-V manager, right click on the VM, click “Settings”, then in the left pane, under “Hardware”, select COM1. From the options pick a Named Pipe and set it to “debug_hv”.
3. Do the same for COM2 and set its name to “debug_kernel”.

Another way of doing that is with the Set-VMComPort cmdlet.

This will create two named pipes on the host, “\\.\pipe\debug_hv” and “\\.\pipe\debug_kernel”, used as a serial debug link for the hypervisor and the root partition kernel respectively. This is just an example, as you can name the pipes however you’d like.

Launch two elevated WinDbg instances, and set their input to be the two named pipes. To do this, click: Start Debugging -> Attach to kernel (ctrl+k).

Boot the guest. Now we can debug both the root partition kernel and the hypervisor!

There you have it. Two WinDbg instances running, for both the kernel and Hyper-V (hvix64.exe, the hypervisor itself). There are other ways to debug Hyper-V, but this is the one I’ve been using for over a year now. However, other people may prefer different setups, for example @gerhart_x’s blogpost about setting up a similar environment with VMware and IDA.

Another option that is similar to this setup is debug over network instead of serial ports using a component called kdnet. This is somewhat similar to this environment and is documented here.

Static Analysis

Before we’ll get our hands dirty with debugging, it could be useful to have a basic idb for the hypervisor, hvix64.exe. Here are a few tips to help set it up.

Download vmx.h and hvgdk.h (from WDK), and load them. These headers include some important structures, enums, constants, etc. To load the header files, go to File -> LoadFile -> Parse C header file (ctrl+F9). With this, we can then define immediate values as their vmx names, so functions handling VT-x logic will be much prettier when you reach them. For example:

Define the hypercalls table, hypercall structure and hypercall group enum. There is a public repo that is a bit outdated, but it’ll certainly help you start. Also, the TLFS contains detailed documentation over most of the hypercalls.
Find the important entry point functions that are accessible from partitions. The basic ones being the read/write MSR handlers, vmexit handler, etc. I highly recommend defining all the read/write VMCS functions, as these are used frequently in the binary.
Other standard library functions like memcmp, memcpy, strstr can easily be revealed by diffing hvix64.exe against ntoskrnl.exe. This is a common trick used by many Hyper-V researchers (for example, here and here).
You will probably notice accesses to different structures pointed by the primary gs structure. Those structures signify the current state (e.g. the current running partition, the current virtual processor, etc.). For instance, most hypercalls check if the caller has permissions to perform the hypercall, by testing permissions flags in the gs:CurrentPartition structure. Another example can be seen in the previous screenshot: we can see the use of CurrentVMCS structure (taken from gs as well), when the processor doesn’t support the relevant enlightenment features. There are offsets for other important structures, such as CurrentPartition, CpuIndex and CurrentVP (Virtual Processor). Those offsets change between different builds, but it’s very easy to find them.

For unknown hypercalls, it’s probably easier to find the callers in ntos and start from there. In ntos there is a wrapper function for each hypercall, which sets up the environment and calls a generic hypercall wrapper (nt!HvcallCodeVa), which simply issues a vmcall instruction.

Attack surface

Now that we have the foundations for research, we can consider the relevant attack surface for a guest-to-host escape. There are many interfaces to access the hypervisor from the root partition:

Hypercalls handlers
Faults (triple fault, EPT page faults, etc.)
Instruction Emulation
Intercepts
Register Access (control registers, MSRs)

Let’s see how we can set breakpoints on some of these interfaces.

Hyper-V hypercalls

Hyper-V’s hypercalls are organized in a table in the CONST section, using the hypercall structure (see this). Let’s hit a breakpoint in one of the hypercalls handlers. For that we need to execute a vmcall in the root partition. Note that even though VT-x let us execute vmcall from CPL3, Hyper-V forbids it and only lets us execute vmcall from CPL0. The classical option to call a hypercall would be to compile a driver and call the hypercall ourselves. An easier (though a bit hacky) option is to break on nt!HvcallInitiateHypercall, which wraps the hypercall:

Where rcx is hypercall id, and rdx, r8 are the input and output parameters. The register mapping is different for fast and non-fast calling convention, as documented in the TLFS:

Let’s see an example for that. We will break on a hypercall handler, after triggering it manually from the root partition. To keep things simple, we’ll pick a hypercall which usually doesn’t get called in normal operation: hv!HvInstallIntercept. First, we need to find it’s hypercall id. There’s a convenient reference for documented hypercalls in Appendix A of TLFS. Specifically, hv!HvInstallIntercept is hypercall id == 0x4d.

Assuming we already have the two debuggers running and an IDA instance we use as a reference, we should first rebase our binary (Edit -> Segments -> Rebase Program) to the base of the running hypervisor address. This step is optional but would save us the need to calculate VAs manually due to ASLR. To get the Hyper-V base, run “lm m hv” in the debugger that’s connected to the hypervisor, and use that address to rebase the idb:

Great! Now, on the second debugger, which is connected to the kernel, bp on nt!HvcallInitiateHypercall.

When this breakpoint hits (happens all the time), ntos is just about to issue a hypercall to Hyper-V. The “hypercall input value”, which hold some flags and the hypercall call code, is on rcx (first argument to nt!HvCallInitiateHypercall). By modifying rcx, we can change the hypercall that will get called.

The hypercall input value isn’t simply the id: it’s the id encoded with some flags, which indicate what type of hypercall we are issuing, and in what “fashion” we want this hypercall to be called. There are different ways to call a hypercall in Hyper-V:

Simple – perform a single operation and has a fixed-size set of input and output parameters.
Rep (repeat) – perform multiple operations, with a list of fixed-size input and/or output elements.

The encoding for the hypercall input value is available in the TLFS. The lower bits represent the call code for the hypercall:

And the meaning of the fields can be seen here (source: TLFS):

When we break on nt!HvcallInitiateHypercall, we might end up hitting a rep or fast hypercall. Some hypercalls must be called “fast” (as indicated by the 16^th bit) or in a repeat. You can see these requirements in the same hypercall table in appendix A in TLFS. For our example, we call it in with register-base calling convention, so set rcx=1<<16 | 0x4d (or in other words, hypercall id 0x4d, with “fast” bit on).

Now, let’s talk about how we pass arguments to hypercalls handlers when we call them from the root partition. The arguments are set in the rdx and r8 registers. Please note, that there is a difference between fast call and non-fast call (in non-fast call, r8 is the output parameter). In the upcoming example, we’ll see a fast call, so r8 will be the second argument.

Inside the hypercall handler, which resides in the hypervisor, the registers are managed differently. The first argument to the hypercall handler, rcx, is a pointer to a structure which holds all the hypercall’s arguments (so, rdx and r8 will be the first 2 qwords in it). The rest of the arguments, rdx, r8, will be set by the hypervisor, and we don’t have direct control over them from a guest partition.

Let’s check it out on hv!HvInstallIntercept. On the left we see our kernel debugger where we break on nt!HvcallInitiateHypercall and set the registers. Then we continue execution and see that the hypercall handler is hit in hvix64.exe. There, *rcx will hold the arguments we passed on registers.

Success!

If we keep stepping, we will return from the function relatively fast, since the first thing that this hypercall handler does is getting the relevant partition by PartitionID, which is its first argument. Simply set rdx=ffffffffffffffff, which is HV_PARTITION_ID_SELF (defined among other definitions in the TLFS):

Then the flow can continue. It’s a pretty common pattern we find in lots of different hypercall handlers, so it’s worth to keep it in mind.

MSRs

Hyper-V handles MSR access (both read and write) in its VMEXIT loop handler. It’s easy to see it in IDA: it’s a large switch case over all the MSRs supported values, with the default case of falling back to rdmsr/wrmsr, if that MSR doesn’t have special treatment by the hypervisor. Note that there are authentication checks in the MSR read/write handlers, checking the current partition permissions. From there we can find the different MSRs Hyper-V supports, and the functions to handle read and write.

In this example, we will read MSR 0x40000200, which returns the hypervisor lowstub address, as documented by some researchers (e.g., here and here). We can read the MSR value directly from the kernel debugger, using the rdmsr command:

And we can see the relevant code in the MSR read handler function (in the hypervisor), which handles this specific MSR:

Instead of just breaking on MSR access, let’s trace all the MSR accesses in the hypervisor debugger:

bp <msr_read_handler_addr> ".printf \"msr_handler called, MSR == 0x%x\\r\\n\", @rdx;g"

This will set a breakpoint over the MSR read function. The function address changes between builds. One way to find it is to look for constants in the binary that are unique to MSRs, and go up the caller functions until you reach the MSR read function. A different way is to follow the flow top-down, from the vmexit loop handler: in the switch case for the VMExit reason (VM_EXIT_REASON in the VMCS structure), you will find EXIT_REASON_MSR_READ which eventually gets to this function.

With this breakpoint set, we can run rdmsr from the kernel debugger and see this breakpoint hit, with the MSR value we read:

You can also set up a little more complicated breakpoint to find the return value of rdmsr, and trace MSR writes in a similar way.

Attack surface outside the hypervisor

To keep the hypervisor minimal, many components in the virtualization stack are running in the root partition. This reduces the attack surface and complexity of the hypervisor. Things like memory management and process management are handled in the root partition’s kernel, without elevated permissions (e.g., SLAT management rights). Moreover, additional components that provide services to the guests, like networking or devices, are implemented in the root partition. A code execution vulnerability in any of these privileged components should allow an unprivileged guest to execute code on the root partition’s kernel or userspace (depending on component). For the virtualization stack, this is equivalent to a guest-to-host, as the root partition is trusted by the hypervisor, and manages all the other guests. In other words, a guest-to-host vulnerability can be exploited without compromising the hypervisor.

(Source and more terminology can be found here).

As @JosephBialek and @n_joly stated in their talk at Blackhat 2018, the most interesting attack surface when trying to break Hyper-V security, is actually not in the hypervisor itself, but in the components that are in the root partition kernel and userspace. As a matter of fact, @smealum presented in the same conference a full guest-to-host exploit, targeting vmswitch, which runs in the root partition’s kernel.

Kernel components

As we can see in the high-level architecture figure, the virtual switch, partitions IPC mechanisms, storage and virtual PCI are implemented in the root partition’s kernelspace. The way those components and architecture is designed, is in the format of “consumers” and “provider”. The “consumers” are drivers on the guest side, and they are consuming some services and facilities from the providers, which reside in the root’s kernel. It’s easy to see them, since the providers usually have an extension of VSP (Virtualization Service Provider), while the consumers have an extension of VSC (Virtualization Service Client). So, therefore, we have:

	ROOT PARTITION KERNEL	GUEST PARTITION KERNEL
Storage	storvsp.sys	storvsc.sys
Networking	vmswitch.sys	netvsc.sys
Synth 3D Video	synth3dVsp.sys	synth3dVsc.sys
PCI	vpcivsp.sys	vpci.sys

And since those are simply drivers running in kernel mode, it’s very easy to debug them, simply with a standard kernel debugger.

It’s worth noting that some of the components have two drivers: one runs inside the guest’s kernel, and the other runs on the host (root partition). This design replaced the old virtualization stack design, which was in Windows 7. In Windows 7, the logic both for the root partition and for the child partitions were combined. After that, it got split, to separate the functionality of the root from the children to two different drivers. The “r” suffix stands for “root”. For example:

Root Partition	Guest
vmbusr.sys	vmbus.sys
vmbkmclr.sys	vmbkmcl.sys
winhvr.sys	winhv.sys

Quick tip:

The Linux guest-side OS component are integrated into the mainline kernel tree and are also available seperately. It is quite useful when building PoCs from a Linux guest (as done here, for instance).

User components

Many interesting components reside in the VM worker process, and it executes in the root partition’s userspace – so debugging it is just like debugging any other userspace process (however you might need to execute your debugger elevated on your root partition). There’s a separate Virtual Machine Worker Process (vmwp.exe) process for each guest, which is launched when the guest is powered on.

Here are a few examples of components that reside in the VMWP, which runs in the root partition’s userspace:

vSMB Server
Plan9FS
IC (Integration components)
Virtual Devices (Emulators, non-emulated devices)

Also, keep in mind, that the instruction emulator and the devices are highly complex components, so it’s a pretty interesting attack surface.

It’s just the start

In this post we covered what we feel is needed to start looking into Hyper-V. It isn’t much, but it is pretty much everything I knew and the environment I used to find my first Hyper-V bug, so it should be good enough to get you going. Having a hypervisor debugger made everything much clearer, as opposed to static analysis of the binary by itself. With this setup, I managed to trigger the bug directly from the kernel debugger, without the hurdles of building a driver.

But that’s only the tip of the iceberg. We are planning to release more information in upcoming blog posts. The next one will feature our friends from the virtualization security team, that will talk about VMBus internals, vPCI, and demo a few examples. If you can’t wait, here are some resources that are already available online and cover the internals of Hyper-V.

Here are some examples of previous issues in the virtualization stack:

If you have any questions, feel free to comment on this post or DM me on Twitter @AmarSaar. I’d love to get feedback and ideas for other topics you would want us to cover in the next blog posts in this series.

Happy hunting!

Saar Amar, MSRC-IL

↧

Fuzzing para-virtualized devices in Hyper-V

January 28, 2019, 8:51 am

≫ Next: Local privilege escalation via the Windows I/O Manager: a variant finding collaboration

≪ Previous: First Steps in Hyper-V Research

Introduction

Hyper-V is the backbone of Azure, running on its Hosts to provide efficient and fair sharing of resources, but also isolation. That’s why we, in the vulnerability research team for Windows, have been working in the background for years now helping secure Hyper-V. And why Microsoft invites security researchers across the globe to submit their vulnerabilities through the Hyper-V Bounty Program for payment of up to $250,000 USD.

To help engage people in the Hyper-V security space, last year internal teams from Microsoft published some of their work.

At BlackHat 2018 USA Joe Bialek and Nicolas Joly presented "A Dive in to Hyper-V Architecture and Vulnerabilities". They covered an architecture overview of Hyper-V oriented to security researchers. They also discussed some interesting vulnerabilities seen in Hyper-V.

In the same conference, Jordan Rabet presented "Hardening Hyper-V through offensive security research", where he discussed in great detail the exploitation process for CVE-2017-0075 in VMSwitch, a Hyper-V component.

Last December Saar Amar published a detailed blog with the fundamentals to get introduced into Hyper-V security research.

Following their work, we’d like to share a new story related to Hyper-V security for anyone interested in getting introduced in Hyper-V security or learning more. Recently we have been working in Virtual PCI (VPCI), one of the para-virtualized devices available in Hyper-V, used to expose hardware to virtual machines. As other para-virtualized devices, it uses VMBus for inter-partition communication.

On this blog we would like to share some of our learnings, introduce both VMBus and VPCI, share one strategy to fuzz the VMBus channel used by VPCI and discuss one of our findings. Some of the concepts and strategies here can be used to work with other virtual devices using VMBus in Hyper-V.

VMBus overview

VMBus is one of the mechanisms used by Hyper-V to offer para-virtualization. In short, it is a virtual bus device that sets up channels between the guest and the host. These channels provide the capability to share data between partitions and setup synthetic devices.

In this section we’ll introduce the VMBus architecture, learn how channels are offered to partitions, and how synthetic devices are setup.

The root partition (or host) hosts Virtualization Service Providers (VSP) that communicate over VMBus to handle devices access requests from child partitions. On the other hand, child partitions (or guests) use Virtualization Service Consumers (VSC) to redirect device requests to the VSP over VMBus. Child partitions require VMBus and VSC drivers to use the para-virtualized device stacks.

VMBus channels allow VSCs and VSPs to transfer data primarily through two ring buffers: upstream and downstream. These ring buffers are mapped into both partitions thanks to the hypervisor, who also provides synthetic interrupts to drive notification between partitions when there is data available.

The architecture can be summarized in the next diagram:

A more detailed introduction to VMBus can be found in the presentations linked before:

"A Dive in to Hyper-V Architecture and Vulnerabilities", slides 15-19.
"Hardening Hyper-V through offensive security research", slides 5-16.

Since VMBus allows I/O related data transmission between the potentially malicious guest and the VSP drivers in the host, the later are a prime candidate for vulnerability hunting and fuzzing. A general idea to fuzz virtual devices is finding the VMBus channel available to a VSC and use it to send malformed data to the VSP.

To do so, we need to understand broadly how VMBus channels are made available to VSCs. Let’s start by introducing how the VMBus device is made available to the guest. From a practical point of view, if you deploy a Windows Generation 2 Virtual Machine (enlightened guest) you can find the exposed VMBus device in the Device Manager:

The connection view in Device Manger also reveals that VMBus is exposed to the guest via ACPI. Indeed, its description can be found in the Differentiated System Description Table (DSDT):

Device(\_SB.VMOD.VMBS)
{
    Name(STA, 0x0F)
    Name(_ADR, Zero)
    Name(_DDN, "VMBUS")
    Name(_HID, "VMBus")
    Name(_UID, Zero)
    Method(_DIS, 0, NotSerialized)
    {
        And(STA, 0x0D, STA)
    }
    Method(_PS0, 0, NotSerialized)
    {
        Or(STA, 0x0F, STA)
    }
    Method(_STA, 0, NotSerialized)
    {
        Return(STA)
    }
    Name(_PS3, Zero)
    Name(_CRS, ResourceTemplate()
    {
        IRQ(Edge, ActiveHigh, Exclusive) {5}
    })
}

Once VMBus is ready, for every channel offered by the root partition, the guest will build a new node in the device tree. The summarized (and generic) flow is:

The root partition offers a channel.
The offer is delivered to the guest through a synthetic interrupt.
In the guest, because of the interrupt, a bus relation query is injected in the PnP system.
In the guest, the VMBus driver creates a new Physical Device Object (PDO) for the device stack. The information of the offer is saved in the PDO context.
The device driver (for example VPCI), creates a new Functional Device Object (FDO) for the device stack. The routine used to create the FDO objects, for example AddDevice in the case of a Plug and Play driver, is a good point to find the code that allocates and opens the new VMBus channel.

A kernel debugger and the command “!devnode” can be used to list the devices available on the top of VMBus inside a guest:

0: kd> !devnode 0 1
Dumping IopRootDeviceNode (= 0xffffe28c76fbd9e0)
DevNode 0xffffe28c76fbd9e0 for PDO 0xffffe28c76e6b830
  InstancePath is "HTREE\ROOT\0"
  State = DeviceNodeStarted (0x308)
  Previous State = DeviceNodeEnumerateCompletion (0x30d)
  .
  .
  .
  DevNode 0xffffe28c76ed19b0 for PDO 0xffffe28c76ecfd80
    InstancePath is "ROOT\ACPI_HAL\0000"
    State = DeviceNodeStarted (0x308)
    Previous State = DeviceNodeEnumerateCompletion (0x30d)
    DevNode 0xffffe28c76f17c00 for PDO 0xffffe28c76eeed30
      InstancePath is "ACPI_HAL\PNP0C08\0"
      ServiceName is "ACPI"
      State = DeviceNodeStarted (0x308)
      Previous State = DeviceNodeEnumerateCompletion (0x30d)
      DevNode 0xffffe28c76e9e8b0 for PDO 0xffffe28c76f52900
        InstancePath is "ACPI\ACPI0004\0"
        State = DeviceNodeStarted (0x308)
        Previous State = DeviceNodeEnumerateCompletion (0x30d)
        DevNode 0xffffe28c76f5b8b0 for PDO 0xffffe28c76f54d60
          InstancePath is "ACPI\PNP0003\3&fdac00f&0"
          State = DeviceNodeInitialized (0x302)
          Previous State = DeviceNodeUninitialized (0x301)
        DevNode 0xffffe28c76f5bbe0 for PDO 0xffffe28c76f59c30
          InstancePath is "ACPI\VMBus\0"
          ServiceName is "vmbus"
          State = DeviceNodeStarted (0x308)
          Previous State = DeviceNodeEnumerateCompletion (0x30d)
          .
          .
          .
          DevNode 0xffffe28c78629340 for PDO 0xffffe28c78625c90
            InstancePath is "VMBUS\{44c4f61d-4444-4400-9d52-802e27ede19f}\{7f7e8f36-7342-4531-a380-d3a9911f80bf}"
            ServiceName is "vpci"
            State = DeviceNodeStarted (0x308)
            Previous State = DeviceNodeEnumerateCompletion (0x30d)
            .
            .

Now that we’ve established VMBus as an interesting attack vector and learned how to use it, we can discuss one of the virtual devices making use of it: VPCI.

Use case: VPCI

VPCI is a virtualized bus driver used to expose hardware to virtual machines. Scenarios using VPCI include SR-IOV and DDA. It’s important to point out that VPCI will be exposed to the guest only if there is a virtual device requiring it (and this must be configured by the host).

In this section we’ll learn how to find the VMBus channel used by VPCI, and how to use it to send arbitrary data to the VSP. We also provide the skeleton of a Windows driver to illustrate the idea.

As previously explained, every para-virtualized device will require a VSC and VSP pair. In the case of VPCI we’ll identify the VSC component as VPCI and the VSP component as VPCIVSP. The VPCI is managed by the vpci.sys driver in the guest. On the other side, vpcivsp.sys manages the VPCIVSP component in the host. For the current analysis we are using vpci.sys version 10.0.17134.228.

Finding the VMBus channel

As we have introduced before, the initialization of a new FDO is a good point to start searching for allocation of VMBus channels.
Since VPCI is a Kernel-Mode Driver Framework (KMDF) driver, we are interested in the call to WdfDriverCreate, and specifically in the DriverConfig parameter:

NTSTATUS WdfDriverCreate(
  PDRIVER_OBJECT         DriverObject,
  PCUNICODE_STRING       RegistryPath,
  PWDF_OBJECT_ATTRIBUTES DriverAttributes, 
  PWDF_DRIVER_CONFIG     DriverConfig,
  WDFDRIVER              *Driver
);

The DriverConfig parameter is interesting because it’s a pointer to a WDF_DRIVER_CONFIG structure, where we can find the EvtDriverDeviceAdd callback function:

typedef struct _WDF_DRIVER_CONFIG {
  ULONG                     Size;
  PFN_WDF_DRIVER_DEVICE_ADD EvtDriverDeviceAdd;
  PFN_WDF_DRIVER_UNLOAD     EvtDriverUnload;
  ULONG                     DriverInitFlags;
  ULONG                     DriverPoolTag;
} WDF_DRIVER_CONFIG, *PWDF_DRIVER_CONFIG;

EvtDriverDeviceAdd is called by the PnP manager to perform device initialization when a new device is found.

In the VPCI case it is FdoDeviceAdd:

During FdoDeviceAdd VPCI will allocate the new VMBus channel with a call to VmbChannelAllocate:

The VmbChannelAllocate prototype can be found in the vmbuskernelmodeclientlibapi.h public header. The pointer to the allocated channel is returned within the third parameter:

/// \page VmbChannelAllocate VmbChannelAllocate
/// Allocates a new VMBus channel with default parameters and callbacks. The
/// channel may be further initialized using the VmbChannelInit* routines before
/// being enabled with VmbChannelEnable. The channel must be freed with
/// VmbChannelCleanup.
///
/// \param ParentDeviceObject A pointer to the parent device.
/// \param IsServer Whether the new channel should be a server endpoint.
/// \param Channel Returns a pointer to an allocated channel.
_IRQL_requires_(PASSIVE_LEVEL)
NTSTATUS
VmbChannelAllocate(
    _In_ PDEVICE_OBJECT ParentDeviceObject,
    _In_ BOOLEAN IsServer,
    _Out_ _At_(*Channel, __drv_allocatesMem(Mem)) VMBCHANNEL *Channel
    );

To understand better how the channel is allocated and the reference stored, let’s review first the call to FdoCreateVmBusChannel from FdoDeviceAdd:

__int64 __fastcall FdoDeviceAdd(__int64 a1, __int64 a2)
{
  __int64 v5; // rbx
  signed int v6; // esi
  .
  .
  .
  // WdfObjectGetTypedContextWorker, similar to WdfObjectGetTypedContext
  v5 = (*(__int64 (__fastcall **)(__int64))(WdfFunctions_01015 + 1616))(WdfDriverGlobals); 
  .
  .
  .
  v6 = FdoCreateVmbusChannel((_QWORD *)v5);
  .
  .
  .
 }

The first argument to FdoCreateVmbusChannel is the context of the FDO device. FdoCreateVmbusChannel will call to VmbChannelAllocate and save the reference to the allocated VMBCHANNEL in the stack (local variable):

__int64 __fastcall FdoCreateVmbusChannel(_QWORD *FdoContext)
{
  v1 = FdoContext;
.
.
.
  __int64 vpciChannel; // [rsp+70h] [rbp+10h]
.
.
.
  v5 = VmbChannelAllocate(v3, 0i64, &vpciChannel);

At this point the channel has been allocated but still cannot be used as it must be opened first. A client VSC opens an offered channel with a call to VmbChannelEnable.

The function prototype is also included in the vmbuskernelmodeclientlibapi.h header:

/// \page VmbChannelEnable VmbChannelEnable
/// Enables a channel that is in the disabled state by connecting to vmbus and
/// offering or opening a channel (whichever is appropriate for the endpoint
/// type).
///
/// See \ref state_model.
///
/// \param Channel A handle for the channel.  Allocated by \ref VmbChannelAllocate.
_Must_inspect_result_
NTSTATUS
VmbChannelEnable(
    _In_    VMBCHANNEL  Channel
    );

In Windows 10 Redstone 4 (1803) the call to VmbChannelEnable happens also at FdoCreateVmbusChannel. After that, the reference to the channel is saved in the FDO context:

  v5 = VmbChannelEnable(vpciChannel);
  if ( v5 >= 0 )
  {
    v1[3] = vpciChannel;
    return 0i64;
  }

Sending data through the VMBus Channel

Now that we understand how VPCI sets up its VMBus channel, a simple strategy to get a reference and use it for fuzzing is to use an upper filter driver for VPCI.

When the VPCI FDO device stack is created our driver will be called by the PnP manager. At that point, the VMBus channel has been already allocated and enabled by FdoDeviceAdd and we can access it through the VPCI FDO Context.

Let’s see how to do it with a driver. The first step is to provide an INF file to install our filter driver for the VPCI device. The important parts of the INF have been highlighted. Take into account that:

wvpci.inf is the INF for the VPCI driver.
The VPCI hardware id is VMBUS\{44C4F61D-4444-4400-9D52-802E27EDE19F}

;
; BlogDriver.inf
;

[Version]
Signature="$WINDOWS NT$"
Class=System
ClassGuid={4d36e97d-e325-11ce-bfc1-08002be10318}
Provider=%ManufacturerName%
DriverVer=
CatalogFile=BlogDriver.cat

[DestinationDirs]
DefaultDestDir = 12

[SourceDisksNames]
1 = %DiskName%,,,""

[SourceDisksFiles]
BlogDriver.sys  = 1

[Manufacturer]
%ManufacturerName%=Standard,NT$ARCH$

[Standard.NT$ARCH$]
%BlogDriver.DeviceDesc%=Install_Section, VMBUS\{44C4F61D-4444-4400-9D52-802E27EDE19F}

[Install_Section.NT]
Include=wvpci.inf
Needs=Vpci_Device_Child.NT
CopyFiles=BlogDriver_Files

[BlogDriver_Files]
BlogDriver.sys

[Install_Section.NT.HW]
Include=wvpci.inf
Needs=Vpci_Device_Child.NT.HW
AddReg=BlogDriver_AddReg

[BlogDriver_AddReg]
HKR,,"UpperFilters",0x00010000,"BlogDriver"

[Install_Section.NT.Services]
Include=wvpci.inf
Needs=Vpci_Device_Child.NT.Services
AddService=BlogDriver,,BlogDriver_Service_Child

[BlogDriver_Service_Child]
DisplayName    = %BlogDriver.SvcDesc%
ServiceType    = 1               ; SERVICE_KERNEL_DRIVER
StartType      = 3               ; SERVICE_DEMAND_START
ErrorControl   = 1               ; SERVICE_ERROR_NORMAL
ServiceBinary  = %12%\BlogDriver.sys

[Strings]
ManufacturerName="TestManufacturer"
ClassName=""
DiskName="BlogDriver Source Disk"
BlogDriver.DeviceDesc="Microsoft Hyper-V Virtual PCI Bus (With Filter)"
BlogDriver.SvcDesc="Microsoft Hyper-V Virtual PCI Bus (With Filter)"

Now let’s see the initial skeleton for the filter driver. Some clarifications first:

The AddDevice routine creates the filter device object and attaches it to the VPCI FDO. A reference to the VPCI VMBus channel is saved in the device extension to make access easier.
In this skeleton all the IRPs are just passed down through the device stack, we do not want to modify VPCI behavior, just access its VMBus channel.

The full skeleton ready to build and play can be found in this repo.
After installing the driver in the guest, the VPCI stack shows our filter driver:

0: kd> !devstack ffff8407f64cbad0
  !DevObj           !DrvObj            !DevExt           ObjectName
  ffff8407f2379de0  \Driver\BlogDriver ffff8407f2379f30  
> ffff8407f64cbad0  \Driver\vpci       ffff8407fa4e42f0  
  ffff8407f62e1c90  \Driver\vmbus      ffff8407f62e2310  00000024
!DevNode ffff8407f2fe26b0 :
  DeviceInst is "VMBUS\{44c4f61d-4444-4400-9d52-802e27ede19f}\{7f7e8f36-7342-4531-a380-d3a9911f80bf}"
  ServiceName is "vpci"

At this point we are ready to send data and fuzz through the channel. There are several public APIs available for sending packets through a VMBus channel. One of them is VmbChannelSendSynchronousRequest. It is one of the APIs used by VPCI and just requires a reference to the VMBCHANNEL to start working. The declaration is available in the vmbuskernelmodeclientlibapi.h header. We have highlighted where to use the VMBCHANNEL:

/// \page VmbChannelSendSynchronousRequest VmbChannelSendSynchronousRequest
/// Sends a packet to the opposite endpoint and waits for a response.
///
/// Clients may call with any combination of parameters. The root may only call
/// this if *Timeout == 0 and the \ref VMBUS_CHANNEL_FORMAT_FLAG_WAIT_FOR_COMPLETION
/// flag is not set.
///
/// \param Channel A handle for the channel.  Allocated by \ref VmbChannelAllocate.
/// \param Buffer Data to send.
/// \param BufferSize Size of Buffer in bytes.
/// \param ExternalDataMdl Optionally, a MDL describing an additional buffer to
///     send.
/// \param Flags Standard flags.
/// \param CompletionBuffer Buffer to store completion packet results in.
/// \param CompletionBufferSize Size of CompletionBuffer in bytes. Must be
///     rounded up to nearest 8 bytes, or else call will fail. On success,
///     returns the number of bytes written into CompletionBuffer.
/// \param Timeout Optionally, a timeout in the style of KeWaitForSingleObject.
///     After this time elapses, the packet will be cancelled. If set to a
///     timeout of 0, this packet will not be queued if it does not fit in the
///     ring buffer.
///
/// \returns STATUS_SUCCESS
/// \returns STATUS_BUFFER_OVERFLOW - The packet did not fit in the buffer and
///     was not queued.
/// \returns STATUS_CANCELLED - The packet was canceled.
/// \returns STATUS_DEVICE_REMOVED - The channel is being shut down.
_When_(Timeout == NULL || Timeout->QuadPart != 0 ||
       (Flags & VMBUS_CHANNEL_FORMAT_FLAG_WAIT_FOR_COMPLETION) != 0,
       _IRQL_requires_(PASSIVE_LEVEL))
_When_(Timeout != NULL && Timeout->QuadPart == 0 &&
       (Flags & VMBUS_CHANNEL_FORMAT_FLAG_WAIT_FOR_COMPLETION) == 0,
        _IRQL_requires_max_(DISPATCH_LEVEL))
NTSTATUS
VmbChannelSendSynchronousRequest(
    _In_                            VMBCHANNEL      Channel,
    _In_reads_bytes_(BufferSize)    PVOID           Buffer,
    _In_                            UINT32          BufferSize,
    _In_opt_                        PMDL            ExternalDataMdl,
    _In_                            UINT32          Flags,
    _Out_writes_bytes_to_opt_(*CompletionBufferSize, *CompletionBufferSize)
                                    PVOID           CompletionBuffer,
    _Inout_opt_ _Pre_satisfies_(*_Curr_ % 8 == 0)
                                    PUINT32         CompletionBufferSize,
    _In_opt_                        PLARGE_INTEGER  Timeout
    );

There are other APIs publicly available and documented at vmbuskernelmodeclientlibapi.h:

VmbPacketSend
VmbPacketSendWithExternalMdl
VmbPacketSendWithExternalPfns

Before using any of these methods on your driver, remember to link against vmbkmcl.lib:

Searching for references to these methods in VPCI can help to analyze and understand better the interactions with the VSP. Another resource that can be helpful to understand the communication is to read through the Linux Integration Services. The client (VSC) implementation for Linux can be found in pci-hyperv.c.

Finding the entry point of untrusted data in the VSP

In this section we’ll introduce packet processing in the VSP side. We’ll use VPCI as an example to learn how to locate the entry point for handling incoming VMBus packets. We’ll not discuss the details about the Virtual PCI communications though, it is out of the scope for this blog. For this analysis we are using vpcivsp.sys 10.0.17134.228.

For any VMBus endpoint, incoming packets from a channel will trigger the EvtChannelProcessPacket callback, as explained in the documentation available in the vmbuskernelmodeclientlibapi.h header:

/// \page EvtVmbChannelProcessPacket EvtVmbChannelProcessPacket
/// \b EvtVmbChannelProcessPacket
/// \param Channel A handle for the channel.  Allocated by \ref VmbChannelAllocate.
/// \param Packet This completion context will be used to identify this packet to KMCL when the transaction can be retired.
/// \param Buffer This contains the packet which was sent by the opposite endpoint.  It does not contain the VMBus and KMCL headers.
/// \param BufferLength The length of Buffer in bytes.
/// \param Flags See VMBUS_CHANNEL_PROCESS_PACKET_FLAGS.
/// 
/// This callback is invoked when a packet has arrived in the incoming ring buffer.
/// For every invocation of this function, the implementer must eventually call
/// \ref VmbChannelPacketComplete.
///
/// This callback can be invoked at DISPATCH_LEVEL or lower, unless the channel
/// has been configured to defer packet processing to a worker thread.  See
/// \ref VmbChannelSetIncomingProcessingAtPassive for more information.
///\code
typedef
_Function_class_(EVT_VMB_CHANNEL_PROCESS_PACKET)
_IRQL_requires_max_(DISPATCH_LEVEL)
VOID
EVT_VMB_CHANNEL_PROCESS_PACKET(
    _In_ VMBCHANNEL Channel,
    _In_ VMBPACKETCOMPLETION Packet,
    _In_reads_bytes_(BufferLength) PVOID Buffer,
    _In_ UINT32 BufferLength,
    _In_ UINT32 Flags
    );

The callback for method processing is set with a call to VmbChannelInitSetProcessPacketCallbacks. It’s also declared in vmbuskernelmodeclientlibapi.h:

/// \page VmbChannelInitSetProcessPacketCallbacks VmbChannelInitSetProcessPacketCallbacks
/// Sets callbacks for packet processing. Only meaningful if KMCL queue
/// management is not suppressed.  TODO:  Make previous sentence more precise.
///
/// Note that ProcessPacketCallback will be invoked for every packet that
/// is received.  ProcessingCompleteCallback will be invoked every time the
/// ring buffer containing incoming packets transitions from non-empty to empty,
/// after the last invocation of ProcessPacketCallback in a single batch.
///
/// \param Channel A handle for the channel.  Allocated by \ref VmbChannelAllocate.
/// \param ProcessPacketCallback A callback that will be called when a packet is
///     ready for processing.
/// \param ProcessingCompleteCallback Optionally, a callback that will be called
///     when processing of a batch of packets has been completed.
///
/// \return STATUS_SUCCESS - function completed successfully
/// \return STATUS_INVALID_PARAMETER_1 - channel parameter was invalid or in an invalid state(Disabled)
NTSTATUS
VmbChannelInitSetProcessPacketCallbacks(
    _In_ VMBCHANNEL Channel,
    _In_ PFN_VMB_CHANNEL_PROCESS_PACKET ProcessPacketCallback,
    _In_opt_ PFN_VMB_CHANNEL_PROCESSING_COMPLETE ProcessingCompleteCallback
    );

With the above information, the packet processing method for the VPCI VSP can be found easily. On vpcivsp.sys just search for references to VmbChannelInitSetProcessPacketCallbacks. The processing method is VirtualBusChannelProcessPacket:

Analysis of the packet processing is out of scope for the blog, but hopefully the initial hints have been provided for researchers willing to invest in this area.

Fuzzing results. One example - CVE-2018-0965

With the approach explained above we developed a fuzzer to target the packet processing in VPCI. In this section we’ll analyze one of the bugs hit by the fuzzer that has been recently patched and learn the kind of problems that can be found involving inter partition communication through VMBus channels.

CVE-2018-0965 is an RCE belonging to the Tier 1 in the Hyper-V Bounty Program. The reference to the official advisory.

The bug lived in the packet processing method for the VPCI VSP. By diffing (diaphora has been used) against the patched vpcivsp.sys (10.0.17134.285) the method VirtualBusChannelProcessPacket can be identified as modified:

By looking at the changes in VirtualBusChannelProcessPacket the interesting one is found:

The call to VirtualBusLookupDevice has been moved from outside a condition to the inside branch. Let’s review the vulnerable code with more context. First, the interesting code:

void __fastcall VirtualBusChannelProcessPacket(__int64 a1, __int64 a2, __int64 a3, unsigned int a4)
{
  unsigned int v4; // er15
  __int64 v5; // rsi
  __int64 v7; // rax
  struct _KEVENT *v11; // rbx
  int v12; // edi
  unsigned int v13; // ecx
  .
  .
  .
  v4 = a4;
  v5 = a3;
  v13 = *(_DWORD *)v5;
  v7 = VmbChannelGetPointer(a1);
  v11 = (struct _KEVENT *)v7;
  .
  .
  .
  if ( v13 == 1112080407 )
  {
    if ( v11[3].Header.SignalState < 0x10002u )
    {
      v36 = 54;
    }
    else
    {
      if ( v4 < 0x50 )
      {
        v12 = -1073741789;
        v14 = 53;
        goto LABEL_26;
      }
      v45 = VirtualBusLookupDevice(v11, *(_DWORD *)(v5 + 4));
      v46 = (volatile signed __int32 *)v45;
      if ( !v45 )
      {
        v41 = 57;
        goto LABEL_71;
      }
      if ( *(_WORD *)(v5 + 12) <= 0x20u )
      {
        v47 = VirtualDeviceCreateSingleInterrupt(v45, v5, &v69);
        memset(&v73, 0, 0x50ui64);
        ...
        v73 = v47;
        ...
        VmbChannelPacketComplete(v6, &v73, 80i64);
        v34 = v46;
        goto LABEL_50;
      }
      v36 = 56;
    }
  }
.
.
.
  return;

LABEL_50:
  VirtualDeviceDereference(v34, v32, v33);
  return;
}

Now let’s recover the definition of the packet processing callback (EvtVmbChannelProcessPacket) from the public header and rewrite the code above with named arguments:

void __fastcall VirtualBusChannelProcessPacket(VMBCHANNEL Channel, VMBPACKETCOMPLETION Packet, PVOID Buffer,
                                               UINT32 BufferLength, UINT32 Flags)
{
  unsigned int v4; // er15
  __int64 v5; // rsi
  __int64 v7; // rax
  struct _KEVENT *v11; // rbx
  int v12; // edi
  unsigned int v13; // ecx
.
.
.
  v4 = BufferLength;
  v5 = Buffer;
  v13 = *(_DWORD *)v5;
  v7 = VmbChannelGetPointer(Channel);
  v11 = (struct _KEVENT *)v7;
.
.
.
  if ( v13 == 1112080407 )
  {
    if ( v11[3].Header.SignalState < 0x10002u )
    {
      v36 = 54;
    }
    else
    {
      if ( v4 < 0x50 )
      {
        v12 = -1073741789;
        v14 = 53;
        goto LABEL_26;
      }
      v45 = VirtualBusLookupDevice(v11, *(_DWORD *)(v5 + 4));
      v46 = (volatile signed __int32 *)v45;
      if ( !v45 )
      {
        v41 = 57;
        goto LABEL_71;
      }
      if ( *(_WORD *)(v5 + 12) <= 0x20u )
      {
        v47 = VirtualDeviceCreateSingleInterrupt(v45, v5, &v69);
        memset(&v73, 0, 0x50ui64);
        ...
        v73 = v47;
        ...
        VmbChannelPacketComplete(v6, &v73, 80i64);
        v34 = v46;
        goto LABEL_50;
      }
      v36 = 56;
    }
  }
.
.
.
  return;
.
.
.
LABEL_50:
  VirtualDeviceDereference(v34, v32, v33);
  return;
}

It’s worth clarifying that the third parameter, Buffer, points to the attacker-controlled data coming from the VPCI channel. The fourth parameter, BufferLength, is the size of Buffer in bytes.

The local variable identified as v13 is assigned from the first DWORD of the PacketBuf and later compared against the constant 1112080407 (0x42490017). By looking at the Linux Integration Services code the constant can be easily identified as PCI_CREATE_INTERRUPT_MESSAGE2. It means PacketBuf in this case is pointing to a pci_create_interrupt2 struct:

struct pci_message {
  u32 type;
} __packed;

/*
 * Function numbers are 8-bits wide on Express, as interpreted through ARI,
 * which is all this driver does.  This representation is the one used in
 * Windows, which is what is expected when sending this back and forth with
 * the Hyper-V parent partition.
 */
union win_slot_encoding {
  struct {
    u32 dev:5;
    u32 func:3;
    u32 reserved:24;
  } bits;
  u32 slot;
} __packed;

/**
 * struct hv_msi_desc2 - 1.2 version of hv_msi_desc
 * @vector:   IDT entry
 * @delivery_mode:  As defined in Intel's Programmer's
 *      Reference Manual, Volume 3, Chapter 8.
 * @vector_count: Number of contiguous entries in the
 *      Interrupt Descriptor Table that are
 *      occupied by this Message-Signaled
 *      Interrupt. For "MSI", as first defined
 *      in PCI 2.2, this can be between 1 and
 *      32. For "MSI-X," as first defined in PCI
 *      3.0, this must be 1, as each MSI-X table
 *      entry would have its own descriptor.
 * @processor_count:  number of bits enabled in array.
 * @processor_array:  All the target virtual processors.
 */
struct hv_msi_desc2 {
  u8  vector;
  u8  delivery_mode;
  u16 vector_count;
  u16 processor_count;
  u16 processor_array[32];
} __packed;

struct pci_create_interrupt2 {
  struct pci_message message_type;
  union win_slot_encoding wslot;
  struct hv_msi_desc2 int_desc;
} __packed;

It allows us to write the vulnerable code again with more information:

void __fastcall VirtualBusChannelProcessPacket(VMBCHANNEL Channel, VMBPACKETCOMPLETION Packet, PVOID Buffer,
                                               UINT32 BufferLength, UINT32 Flags)
{
  unsigned int v4; // er15
  pci_ceate_interrupt2 *createInterrupt; // rsi
  __int64 v7; // rax
  struct _KEVENT *v11; // rbx
  int v12; // edi
  unsigned int messageType; // ecx
.
.
.
  v4 = BufferLength;
  createInterrupt = Buffer;
  messageType = createInterrupt->message_type.type;
  v7 = VmbChannelGetPointer(Channel);
  v11 = (struct _KEVENT *)v7; // Looks like IDA analysis has misunderstood v7.
.
.
.
  if (messageType == PCI_CREATE_INTERRUPT_MESSAGE2)
  {
    if ( v11[3].Header.SignalState < 0x10002u ) // Looks like IDA analysis has misunderstood v7/v11.
    {
      v36 = 54;
    }
    else
    {
      if ( v4 wslot.slot); 
      v46 = (volatile signed __int32 *)v45;
      if ( !v45 )
      {
        v41 = 57;
        goto LABEL_71;
      }
      if (createInterrupt->int_desc.processor_count <= 0x20u )
      {
        v47 = VirtualDeviceCreateSingleInterrupt(v45, createInterrupt, &v69);
        memset(&v73, 0, 0x50ui64);
        ...
        v73 = v47;
        ...
        VmbChannelPacketComplete(v6, &v73, 80i64);
        v34 = v46;
        goto LABEL_50;
      }
      v36 = 56;
    }
  }
.
.
.
  return;
.
.
.
LABEL_50:
  VirtualDeviceDereference(v34, v32, v33);
  return;
}

As a summary, in the vulnerable version, a PCI_CREATE_INTERRUPT_MESSAGE2 packet with a processor_count bigger than 0x20 can force a flow where VirtualBusLookupDevice is called but, after failing to pass the condition, returns without calling VirtualDeviceDereference.
Let’s check both VirtualBusLookupDevice and VirtualBusDereference in the vulnerable version of vpcivsp.sys. Starting with VirtualBusLookupDevice:

signed __int64 __fastcall VirtualBusLookupDevice(struct _KEVENT *a1, int a2)
{
  struct _KEVENT *v2; // rsi
  int v3; // ebp
  struct _KEVENT *v4; // rbx
  char v5; // di
  signed __int64 v6; // rcx
  _LIST_ENTRY *i; // rax
  signed __int64 v8; // rbx

  v2 = a1 + 2;
  v3 = a2;
  v4 = a1;
  v5 = 0;
  KeWaitForSingleObject(&a1[2], 0, 0, 0, 0i64);
  v6 = (signed __int64)&v4[1].Header.WaitListHead;
  for ( i = v4[1].Header.WaitListHead.Flink; ; i = i->Flink )
  {
    v8 = (signed __int64)&i[-12].Blink;
    if ( i == (_LIST_ENTRY *)v6 )
      break;
    if ( *(_DWORD *)(v8 + 408) == v3 && (*(_DWORD *)(v8 + 1820) & 0x80u) != 0 )
    {
      _InterlockedIncrement((volatile signed __int32 *)(v8 + 200));
      v5 = 1;
      break;
    }
  }
  KeSetEvent(v2, 0, 0);
  return v8 & -(signed __int64)(v5 != 0);
}

We know, from the previous analysis, that:

The second argument is the device slot.
The first argument has been misunderstood as an _KEVENT. It points to an object that has been saved in the channel context. Most likely a most complex one, that contains a _KEVENT as a field.

Let’s analyze the code again after some renaming:

signed __int64 __fastcall VirtualBusLookupDevice(__int64 a1, int slot)
{
  struct _KEVENT *v2; // rsi
  int v3; // ebp
  __int64 v4; // rbx
  char v5; // di
  signed __int64 v6; // rcx
  _QWORD *i; // rax
  signed __int64 v8; // rbx

  v2 = (struct _KEVENT *)(a1 + 48);
  v3 = slot;
  v4 = a1;
  v5 = 0;
  KeWaitForSingleObject((PVOID)(a1 + 48), 0, 0, 0, 0i64);
  v6 = v4 + 32;
  for ( i = *(_QWORD **)(v4 + 32); ; i = (_QWORD *)*i )
  {
    v8 = (signed __int64)(i - 23);
    if ( i == (_QWORD *)v6 )
      break;
    if ( *(_DWORD *)(v8 + 408) == v3 && (*(_DWORD *)(v8 + 1820) & 0x80u) != 0 )
    {
      _InterlockedIncrement((volatile signed __int32 *)(v8 + 200));
      v5 = 1;
      break;
    }
  }
  KeSetEvent(v2, 0, 0);
  return v8 & -(signed __int64)(v5 != 0);
}

The method works with the object pointed by the first argument. Given the name of the method VirtualBusLookupDevice we can guess it is the virtual bus.
A _KEVENT within the virtual bus is used for synchronization.
A container is stored at offset 32 of the virtual bus object.
The main loop is iterating over the container, most likely a list.
Within the loop v8 holds the reference to every object within the container.
The field at offset 408 is compared against the slot id. The guess is that we are iterating over a list of devices.
If a matching device is found, its field at offset 200 is incremented and a reference is returned. The field at offset 200 looks like a reference count and a 32 bits size field.

Let’s go to VirtualDeviceDereference now. As a reminder, the first argument is the pointer returned by VirtualBusLookupDevice (most likely a device):

In the disassembly above, VirtualDeviceDereference decrements the field at offset 200 (identified as a potential reference count before). If the reference count reaches to 0 VirtualDeviceDestroy is called, where the device is freed:

void __fastcall VirtualDeviceDestroy(PVOID P, __int64 a2, __int64 a3)
{
  char *v3; // rbx


  v3 = (char *)P;
  //
  // Lots of things...
  //
  ExFreePoolWithTag(v3, 0x49435056u);
}

To summarize. By sending packets PCI_CREATE_INTERRUPT_MESSAGE2, with a processor_count bigger than 0x20, the device reference count can be overflowed and the device object unexpectedly freed, leading to a dangerous situation if pending references to the device are left… but that is a story for another blog

Closure

We have learned the basics of VMBus, the main component to provide para-virtualized devices in Hyper-V. We have also showed a generic approach to fuzz VMBus channels, using VPCI as example. Finally, we got a deep dive on one of the bugs found recently using this approach.
We hope the information here will be useful for security researches interested in Hyper-V and encourage bug hunting from the security community.

PD: We are always looking for vulnerability researches and security engineers to come help make Windows, Hyper-V, Azure and Linux more secure. If interested, please reach out at wdgsarecruitment@microsoft.com!

Virtualization Security Team.

↧

Local privilege escalation via the Windows I/O Manager: a variant finding collaboration

March 14, 2019, 9:00 am

≫ Next: Vulnerability hunting with Semmle QL, part 2

≪ Previous: Fuzzing para-virtualized devices in Hyper-V

The Microsoft Security Response Center (MSRC) investigates all reports of security vulnerabilities affecting Microsoft products and services to help make our customers and the global online community more secure. We appreciate the excellent vulnerability research reported to us regularly from the security community, and we consider it a privilege to work with these researchers.

One researcher who consistently reports high-quality, interesting vulnerabilities to us is James Forshaw of Google Project Zero. Most of James’ work focuses on complex logic bugs in Windows internals, particularly in the area of privilege escalation and sandbox escapes.

This blog post covers a collaboration between James and the MSRC team on a novel bug class he discovered in the Windows kernel and some of its drivers, how Microsoft’s engineering teams fixed these bugs, and how third-party driver developers can avoid introducing similar bugs.

Background

In Windows, when a system call is made from a user mode thread, the system call handler records this in the thread object by setting its PreviousMode field to UserMode. If instead the system call is made from kernel mode using a Zw-prefixed function, or from a system thread, the PreviousMode of the thread will be set to KernelMode. This method of distinguishing between user mode and kernel mode callers is used to help determine if the arguments of the call are from a trusted or untrusted source, and therefore to what extent they need to be validated by the kernel.

When a user mode application creates or opens a file, this causes a system call to be made to NtCreateFile or NtOpenFile. Kernel mode code has an broader set of API functions to choose from: NtCreateFile/NtOpenFile and their Zw-prefixed equivalents, the IoCreateFile* functions from the I/O Manager, and the FltCreateFile* functions from the Filter Manager.

As illustrated in the diagram above, all of these end up at the I/O Manager internal function IopCreateFile. The thread’s PreviousMode is assigned to a variable AccessMode, which in IopCreateFile is used to decide whether or not to check for valid parameters and buffers, before being passed to the Object Manager in a call to ObOpenObjectByNameEx. Later, in IopParseDevice, the AccessMode is used in access checking – if it is UserMode, then a privilege check is performed on the device object. Next, IopParseDevice constructs an I/O Request Packet (IRP), sets its RequestorMode field to the AccessMode, and uses IofCallDriver to pass control to the IRP_MJ_CREATE dispatch function of the device.

IopCreateFile has an Options parameter which is not exposed to callers of NtCreateFile and NtOpenFile, but is to the API functions only reachable from kernel mode. If the IO_NO_PARAMETER_CHECKING flag is set, it overrides the AccessMode so that it’s set to KernelMode rather than the thread’s previous mode, and thus bypasses parameter validation. This also causes the privilege checks later on in IopParseDevice to be waived.

Note that IoCreateFileEx always sets the IO_NO_PARAMETER_CHECKING flag. As FltCreateFile, FltCreateFileEx and FltCreateFile2 call into the I/O Manager via this function, these in turn also always have IO_NO_PARAMETER_CHECKING set.

However, sometimes it is essential to override this behaviour, and force the access checks to occur. For example, a kernel mode driver which (perhaps via an IOCTL) opens an object name specified by a user mode application.

If the Options parameter of IopCreateFile has the IO_FORCE_ACCESS_CHECK flag set, this has two effects: firstly, it causes the I/O Manager, in IopParseDevice, to perform the access checks as if the AccessMode was UserMode (but without setting it to UserMode). Secondly, in the IRP’s stack location for IRP_MJ_CREATE, it causes the SL_FORCE_ACCESS_CHECK to be set in the Flags field. Handlers of IRP_MJ_CREATE requests are expected to use this flag in their own access checks, to override the IRP’s RequestorMode.

During the development of Windows XP, it became apparent that other API functions operating in the object namespace (e.g. ZwOpenKey for \Registry) needed some method of forcing an access check, so a new flag OBJ_FORCE_ACCESS_CHECK was introduced. This is set on the attributes of the object being requested and causes the Object Manager (rather than the I/O Manager) to set the requestor’s access mode to UserMode. This takes precedence over any access mode set already – in particular, it will override the effect of IO_NO_PARAMETER_CHECKING in setting KernelMode, back in IopCreateFile.

To summarise the above:

in deciding whether to perform an access check, an IRP_MJ_CREATE handler must not only check if the IRP’s RequestorMode is UserMode or not, but also check if the SL_FORCE_ACCESS_CHECK flag is set

a kernel mode caller to the IoCreateFile* or FltCreateFile* API functions has two possible methods of specifying that an access check should be performed:
- via the I/O Manager, by setting the IO_FORCE_ACCESS_CHECK Options flag, which in turn sets the SL_FORCE_ACCESS_CHECK flag in the IRP stack location Flags
- via the Object Manager, by setting the OBJ_FORCE_ACCESS_CHECK OptionAttributes->Attributes flag, which causes the IRP’s RequestorMode to be set to UserMode

Vulnerability

In his research, James found that there were various kernel mode drivers shipped with Windows that, when handling IRP_MJ_CREATE requests, check the IRP’s RequestorMode, but do not check for SL_FORCE_ACCESS_CHECK. Furthermore, these are potentially exploitable via kernel mode code that, on the face of it, appears to be doing the correct thing in setting IO_FORCE_ACCESS_CHECK when creating or opening a file. An attacker obtaining sufficient control of the arguments of a file create/open call, via some request originating from user mode, could use this to send an IRP_MJ_CREATE request where the RequestorMode is KernelMode. If the RequestorMode check is used in a security decision, this may lead to a local privilege escalation vulnerability.

Further details, including how James discovered this vulnerability class and examples of where such code occurs in the Windows kernel and drivers, can be found in his post on the Google Project Zero blog.

James specified two kernel mode code patterns – the ‘initiator’, which makes a file create/open call, and the ‘receiver’, which handles IRP_MJ_CREATE requests. These are defined as follows:

The ‘initiator’ consists of:
- A call to a file open API function (IoCreateFile* or FltCreateFile*) where:
  - the IO_NO_PARAMETER_CHECKING flag in Options is set (or alternatively, where the call is being made from a system thread)
- the IO_FORCE_ACCESS_CHECK flag is set in Options, indicating that an access check is intended
- the OBJ_FORCE_ACCESS_CHECK attribute in ObjectAttributes is not set
an attacker has some measure of control over this call

The ‘receiver’ consists of:
- A handler for an IRP_MJ_CREATE request where:

An attacker would need to be able to direct the initiator to open a device object that is handled by the receiver. The security check in the receiver is bypassed because the Irp->RequestorMode will be KernelMode, but the SL_FORCE_ACCESS_CHECK flag is not examined.

In his investigations, James had found instances of both initiators and receivers, but none that when chained together would directly lead to privilege escalation. We opted to partner with him on further research and see what we could find together.

Variant finding

For first-party drivers shipped with Windows (drivers written by Microsoft) and the Windows kernel itself, we used Semmle QL (previously discussed on this blog here) to search the source code for the vulnerability code patterns described above.

To find initiator code patterns, we used a custom data flow analysis to track combinations of flags to Options and ObjectAttributes->Attributes when passed to the internal function IopCreateFile. As mentioned above, this is the point at which the various file open API functions eventually reach. This result set was filtered to show only the calls where IO_FORCE_ACCESS_CHECK and IO_NO_PARAMETER_CHECKING were set, but OBJ_FORCE_ACCESS_CHECK was not. We rejected initiators which offered no control to an attacker of the object name.

To discover receiver code patterns, we examined controlling expressions (that is, expressions used in control flow statements such as if and switch) that were influenced by the RequestorMode field of an IRP object, and were reachable from either an IRP_MJ_CREATE dispatch or filter function. These were filtered to exclude expressions that involved both the SL_FORCE_ACCESS_CHECK macro and some access to the Flags field of an IO_STACK_LOCATION object. A small number of RequestorMode checks were rejected in manual follow-up as having no security impact (for example, where they were being used to exclude kernel mode callers, rather than permit them).

This initial analysis found a total of 11 potential initiators and 16 potential receivers in the Windows source code, including those James had reported to us.

Windows also ships with many “inbox drivers” – third-party drivers that are critical for booting certain devices or that enable a fully functional install out of the box. We filtered on the import table of each driver binary to obtain a subset for further analysis. For the initiators these were imports of IoCreateFile* or FltCreateFile*, and for receivers this was IoCreateDevice or FltRegisterFilter, as we were only interested in code that is reachable via a device object or its filters. This remaining set of driver binaries were examined using IDA Pro. This analysis found no additional initiators or receivers.

Exploiting these potential vulnerabilities requires compatible initiators and receivers. In particular, the initiator must offer sufficient control to an attacker of the eventual IopCreateFile call, so that they can exploit the receiver.

We found that the receivers fell into two categories:

requiring specific extended attributes to be supplied, either to reach a RequestorMode check, or to do something useful after bypassing it in terms of exploitation

requiring the file handle to be passed back to the attacker to reach code in its other IRP dispatch functions that may be exploitable

Fortunately, none of the initiators detected in our analysis gave an attacker sufficient capability to do either of these.

In the next step of our analysis, we performed a broader search encompassing all calls to kernel mode file create/open APIs, including calls to ZwCreateFile and ZwOpenFile, and calls to IoCreateFile* and FltCreateFile* where IO_NO_PARAMETER_CHECKING is set (irrespective of whether or not IO_FORCE_ACCESS_CHECK was set). After excluding all calls where OBJ_FORCE_ACCESS_CHECK was set, there were still hundreds of results in kernel and driver code, so we filtered these down by focusing on the two receiver categories.

Firstly, we filtered for calls where the EaBuffer parameter was non-NULL, to show places where extended attributes could be passed in. Secondly, we filtered for calls where OBJ_KERNEL_HANDLE was not set, to see where it may be possible for a usable object handle to be passed back to user mode. This brought the results down to a manageable number for manual analysis. However, we did not find any code that could be used as a compatible initiator within this result set.

Defence in depth security measures

To summarize James’ and MSRC’s combined investigations, there appeared to be no combination of initiator and receiver present in currently supported versions of Windows that could be used for local privilege escalation out of the box.

Nevertheless, we chose to address these in future versions of Windows as a defence-in-depth measure. Most of these fixes are on track for release in Windows 10 19H1, with a few held back for further compatibility testing and/or because the component they exist in is deprecated and disabled by default.

We did consider a broad fix to prevent instances of an initiator from occurring, in making an API change so that if IO_FORCE_ACCESS_CHECK is set in Options, the IRP’s RequestorMode is automatically set to UserMode, as if the OBJ_FORCE_ACCESS_CHECK attribute was set. However, the compatibility risk of breaking functionality of third-party drivers that may rely on the existing behaviour was deemed to be too high.

Information for driver developers

There exists some risk of third-party drivers being susceptible to this vulnerability class, and we urge all kernel driver developers to review their code to ensure correct processing of IRP requests and defensive use of the file open APIs.

The recommended changes should be relatively simple.

In IRP_MJ_CREATE dispatch handlers, don’t rely on the value of the IRP’s RequestorMode without also checking for the SL_FORCE_ACCESS_CHECK flag. For example, instead of:

if (Irp->RequestorMode != KernelMode) { // reject user mode requestors Status = STATUS_ACCESS_DENIED; }

use something like this:

PIO_STACK_LOCATION IrpSp = IoGetCurrentIrpStackLocation(Irp);

…

    if ((Irp->RequestorMode != KernelMode) || (IrpSp->Flags & SL_FORCE_ACCESS_CHECK))
    {
        // reject user mode requestors
        Status = STATUS_ACCESS_DENIED;
    }

Secondly, where the IO_FORCE_ACCESS_CHECK flag is already set in Options, we strongly recommend also setting the OBJ_FORCE_ACCESS_CHECK flag in ObjectAttributes. For example:

InitializeObjectAttributes( &ObjectAttributes, FileName, (OBJ_CASE_INSENSITIVE | OBJ_FORCE_ACCESS_CHECK), NULL, NULL);

    Status = IoCreateFileEx(
        &ObjectHandle,
        GENERIC_READ | SYNCHRONIZE,
        &ObjectAttributes,
        &IoStatusBlock,
        NULL,
        0,
        0,
        FILE_OPEN,
        0,
        NULL,
        0,
        CreateFileTypeNone,
        NULL,
        IO_FORCE_ACCESS_CHECK);

More generally, where a file create/open call may be made on behalf of a user-mode request, do not assume that the thread’s previous mode is UserMode or that this will be carried forward to the IRP’s requestor mode – set the OBJ_FORCE_ACCESS_CHECK flag in ObjectAttributes to make this explicit.

Acknowledgements

We’d like to thank James Forshaw for partnering with us on this vulnerability investigation, and for the many other high-quality vulnerability reports he has shared with the MSRC.

Thanks also to Paul Brookes, Dileepa Kidambi Sudarsana, and Michelle Chen for their assistance in scaling the static analysis to the entire Windows codebase.

Steven Hunter, MSRC Vulnerabilities & Mitigations team

↧

Vulnerability hunting with Semmle QL, part 2

March 19, 2019, 9:00 am

≫ Next: Time travel debugging: It’s a blast! (from the past)

≪ Previous: Local privilege escalation via the Windows I/O Manager: a variant finding collaboration

The first part of this series introduced Semmle QL, and how the Microsoft Security Response Center (MSRC) are using it to investigate variants of vulnerabilities reported to us. This post discusses an example of how we’ve been using it proactively, covering a security audit of an Azure firmware component.

This was part of a wider defense in depth security review of Azure services, exploring attack vectors from the point of view of a hypothetical adversary who has already penetrated at least one security boundary, and now sits in the operating environment of a service backend (marked with * on the diagram below).

One of the targets of this review was a Linux-based embedded device that interfaces both with a service backend and a management backend, passing operational data between the two. The main attack surface of this device is a management protocol used on both interfaces.

An initial manual review of its firmware indicated that this management protocol is message-based, and there are over four hundred different message types, each with their own handler function. Manually auditing every single function would have been tedious and error-prone, so using Semmle to scale up our code review capabilities was an easy choice. We found 33 vulnerable message handler functions in total, using the static analysis techniques discussed in this post.

Defining the attack surface

Our first step was to write some QL to model data that would be sourced from an attacker. The management protocol works on a request-response basis, where every message request type is identified with a category number and a command number. This is defined in the source code using arrays of structures such as this:

MessageCategoryTable g_MessageCategoryTable[] = { { CMD_CATEGORY_BASE, g_CommandHandlers_Base }, { CMD_CATEGORY_APP0, g_CommandHandlers_App0 }, … { NULL, NULL } };

CommandHandlerTable g_CommandHandlers_Base [] =
{
    { CMD_GET_COMPONENT_VER,  sizeof(ComponentVerReq),  GetComponentVer,  … },
    { CMD_GET_GLOBAL_CONFIG,  -1,                       GetGlobalConfig,  … },
    …
    { NULL,                   NULL,                     NULL,             … }
};

In the example above, a message with category type CMD_CATEGORY_BASE and command type CMD_GET_COMPONENT_VER would be routed to the GetComponentVer function. The command handler table also has information on the expected size of the request message, which is validated in the message dispatch routines prior to calling the handler function.

We defined the message handler table with the following QL:

class CommandHandlerTable extends Variable { CommandHandlerTable() { exists(Variable v | v.hasName("g_MessageCategoryTable") and this.getAnAccess() = v.getInitializer().getExpr().getAChild().getChild(1) ) } }

This takes a variable named g_MessageCategoryTable, finds its initializing expression, and matches all children of this expression – each child expression corresponds to a row of the message category table. For each row, it takes the second column (this is getChild(1) because the parameter of the getChild predicate is zero-indexed), each of which are references to a command handler table, and matches on the variable referenced. In the example above, these would be g_CommandHandlers_Base and g_CommandHandlers_App0.

We defined the set of message handler functions using a similar approach:

class MessageHandlerFunction extends Function {
  Expr tableEntry;

  MessageHandlerFunction() {
    exists(CommandHandlerTable table |
      tableEntry = table.getInitializer().getExpr().getAChild()
      )
    and this = tableEntry.getChild(2).(FunctionAccess).getTarget()
  }

  int getExpectedRequestLength() {
    result = tableEntry.getChild(1).getValue().toInt()
  }

…
}

This QL class uses a member variable tableEntry to hold the set of all rows in all command handler tables. This is so it can be referenced in both the characteristic predicate (MessageHandlerFunction() { … }) and getExpectedRequestLength(), without repeating the definition.

All of this maps to the code structure above as follows:

Each message handler function has the same signature:

typedef unsigned char UINT8;

int ExampleMessageHandler(UINT8 *pRequest, int RequestLength, UINT8 *pResponse);

And follows a general pattern where the request data is cast to a struct type representing the message layout, and accessed via its fields:

int ExampleMessageHandler(UINT8 *pRequest, int RequestLength, UINT8 *pResponse) { ExampleMessageRequest* pMsgReq = (ExampleMessageRequest *)pRequest;

…

someFunction(pMsgReq->aaa.bbb)

…
}

In this analysis, we were only interested in the request data. We defined two additional predicates in the MessageHandlerFunction QL class to model the request data and its length:

class MessageHandlerFunction extends Function { Expr tableEntry;

…

  Parameter getRequestDataPointer() {
    result = this.getParameter(0)
  }

  Parameter getRequestLength() {
    result = this.getParameter(1)
  }
}

Having abstracted away the definition of a message handler function, it can be used as we would any other QL class. For example, this query lists all message handler functions in descending order of their cyclomatic complexity:

from MessageHandlerFunction mhf select mhf, mhf.getADeclarationEntry().getCyclomaticComplexity() as cc order by cc desc

Analyzing data flow

Now that we’d defined a set of entry points for untrusted data, the next step was to find where it may be used in an unsafe manner. To do this, we needed to follow the flow of such data through the codebase. QL provides a powerful global data flow library which abstracts away most of the tricky language-specific detail involved in this.

The DataFlow library is brought into the scope of the query with:

import semmle.code.cpp.dataflow.DataFlow

It is used by subclassing DataFlow::Configuration and overriding its predicates to define the data flow as it applies to DataFlow::Node, a QL class representing any program artefact that data can flow through:

Configuration predicate	Description
`isSource(source)`	data must flow from source
`isSink(sink)`	data must flow to sink
`isAdditionalFlowStep(node1, node2)`	data can also flow between node1 and node2
`isBarrier(node)`	data can not flow through node

Most data flow queries will look something like this:

class RequestDataFlowConfiguration extends DataFlow::Configuration { RequestDataFlowConfiguration() { this = "RequestDataFlowConfiguration" } override predicate isSource(DataFlow::Node source) { … } override predicate isSink(DataFlow::Node sink) { … } override predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) { … } override predicate isBarrier(DataFlow::Node node) { … } }

from DataFlow::Node source, DataFlow::Node sink
where any(RequestDataFlowConfiguration c).hasFlow(source, sink)
select
"Data flow from $@ to $@",
source, sink

Note that the QL data flow library performs an interprocedural analysis – in addition to examining data flows local to a function, it will include data flowing through function call arguments. This was an essential feature for our security review, as although the vulnerable code patterns discussed below are shown in simple example functions for ease of demonstration, in the actual source code for our target, most of the results had data flows spanning multiple complex functions.

Finding memory safety vulnerabilities

As this firmware component was a pure C codebase, we first decided to search for code patterns relating to memory safety.

One common source of such bugs is array indexing without performing a bounds check. Searching for this pattern in isolation would provide a large proportion of results that are most likely not security vulnerabilities, as what we are really interested in is where the attacker has some control over the index value. So in this case, we are looking for data flows where the sink is an array indexing expression, the source is the request data of a message handler function, and there is a barrier on any data flow node guarded by a relevant bounds check.

For example, we want to find data flows matching code like this:

int ExampleMessageHandler(UINT8 *pRequest^(1:source), int RequestLength, UINT8 *pResponse) { ExampleMessageRequest* pMsgReq⁽³⁾ = (ExampleMessageRequest *) pRequest⁽²⁾; int index1⁽⁶⁾ = pMsgReq⁽⁴⁾->index1⁽⁵⁾;

pTable1[index1^(7:sink)].field1 = pMsgReq->value1;
}

But we also want to exclude data flows for code like this:

    if (index2 >= 0 && index2 < PTABLE_SIZE)
    {
        pTable2[index2].field1 = pMsgReq->value2;
    }
}

The source is defined using the MessageHandlerFunction class discussed earlier, and we can use the getArrayOffset predicate of an ArrayExpr to define a suitable sink:

override predicate isSource(DataFlow::Node source) { any(MessageHandlerFunction mhf).getRequestDataPointer() = source.asParameter() } override predicate isSink(DataFlow::Node sink) { exists(ArrayExpr ae | ae.getArrayOffset() = sink.asExpr()) }

By default, the DataFlow library only includes flows that preserve the value at each node, such as function call parameters, assignment expressions, and the like. But we also need data to flow from the request data pointer to the fields of the structure it was cast to. We’ll do that like this:

override predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) { // any terminal field access on request packet // e.g. in expression a->b.c the data flows from a to c exists(Expr e, FieldAccess fa | node1.asExpr() = e and node2.asExpr() = fa | fa.getQualifier*() = e and not (fa.getParent() instanceof FieldAccess) ) }

To exclude flows with a bounds check, we place a barrier on any node with a variable or field that is used in some conditional statement earlier on in the control flow graph (for now, we make the assumption that any such bounds check is done correctly):

override predicate isBarrier(DataFlow::Node node) { exists(ConditionalStmt condstmt | // dataflow node variable is used in expression of conditional statement // this includes fields (because FieldAccess extends VariableAccess) node.asExpr().(VariableAccess).getTarget().getAnAccess() = condstmt.getControllingExpr().getAChild*() // and that statement precedes the dataflow node in the control flow graph and condstmt.getASuccessor+() = node.asExpr() // and the dataflow node itself not part of the conditional statement expression and not (node.asExpr() = cs.getControllingExpr().getAChild*()) ) }

Applying this to the two examples above, the data flow through each node would be:

In our firmware codebase, this query located a total of 18 vulnerabilities across 15 message handler function, a mix of attacker-controlled out of bounds reads and writes.

We applied a similar analysis to find where arguments of function calls were taken from the message request data without first being validated. Firstly, we defined a QL class to define the function calls and arguments of interest, including the size argument of calls to memcpy and a similar function _fmemcpy, and the length argument of CalculateChecksum. CalculateChecksum is a function specific to this codebase that would return the CRC32 of a buffer, and could be potentially be used as an information disclosure primitive where the message handler function copied this value into its response buffer.

class ArgumentMustBeCheckedFunctionCall extends FunctionCall { int argToCheck; ArgumentMustBeCheckedFunctionCall() { ( this.getTarget().hasName("memcpy") and argToCheck = 2 ) or ( this.getTarget().hasName("_fmemcpy") and argToCheck = 2 ) or ( this.getTarget().hasName("CalculateChecksum") and argToCheck = 1 ) }

Expr getArgumentToCheck() { result = this.getArgument(argToCheck) }
}

Next, we modified the sink of the previous query to match on ArgumentMustBeCheckedFunctionCall instead of an array index:

override predicate isSink(DataFlow::Node sink) { // sink node is an argument to a function call that must be checked first exists (ArgumentMustBeCheckedFunctionCall fc | fc.getArgumentToCheck() = sink.asExpr()) }

This query revealed a further 17 vulnerabilities in 13 message handlers, mostly attacker-controlled out of bounds reads (for which we later confirmed was disclosed in a response message), with one out of bounds write.

Taint tracking

In the above queries, we overrode the DataFlow library’s isAdditionalFlowStep predicate to ensure that where data flowed to a pointer to a structure, the fields of that structure would be added as nodes in the data flow graph. We did this because by default, the data flow analysis only includes paths where the value of the data remains unmodified, but we wanted to keep track of a particular set of expressions that it may have affected too. That is, we defined a particular set of expressions that were tainted by untrusted data.

QL contains a built-in library to apply a more general approach to taint tracking. Developed on top of the DataFlow library, it overrides isAdditionalFlowStep with a much richer set of rules for value-modifying expressions. This is the TaintTracking library, and it is imported in a similar manner to DataFlow:

import semmle.code.cpp.dataflow.TaintTracking

It is used in almost the same way as the data flow library, except that the QL class to extend is TaintTracking::Configuration, with these configuration predicates:

Configuration predicate	Description
`isSource(source)`	data must flow from source
`isSink(sink)`	data must flow to sink
`isAdditionalTaintStep(node1, node2)`	data at node1 will also taint node2
`isSanitizer(node)`	data can not flow through node

We re-ran the earlier queries with isAdditionalFlowStep removed (as we no longer need to define it) and isBarrier renamed to isSanitizer. As expected, it returned all the results mentioned above, but also uncovered some additional integer underflow flaws in array indexing. For example:

pTable1[(index1⁽⁷⁾ - 2)^(8:sink)].field1 = pMsgReq->value1;
}

For our internal reporting of each vulnerability type, we were interested in classifying these separately from the earlier query results. This involved a simple modification to the sink, using the SubExpr QL class:

override predicate isSink(DataFlow::Node sink) { // this sink is the left operand of a subtraction expression, // which is part of an array offset expression, e.g. x in a[x - 1] exists(ArrayExpr ae, SubExpr s | sink.asExpr() instanceof FieldAccess and ae.getArrayOffset().getAChild*() = s and s.getLeftOperand().getAChild*() = sink.asExpr()) }

This gave us an additional 3 vulnerabilities in 2 message handler functions.

Finding path traversal vulnerabilities

With the intent of finding potential path traversal vulnerabilities, we used QL to attempt to identify message handler functions which used an attacker-controlled filename in a file open function.

We used a slightly different approach to taint tracking this time, defining some additional taint steps that would flow through various string-processing C library functions:

predicate isTaintedString(Expr expSrc, Expr expDest) { exists(FunctionCall fc, Function f | expSrc = fc.getArgument(1) and expDest = fc.getArgument(0) and f = fc.getTarget() and ( f.hasName("memcpy") or f.hasName("_fmemcpy") or f.hasName("memmove") or f.hasName("strcpy") or f.hasName("strncpy") or f.hasName("strcat") or f.hasName("strncat") ) ) or exists(FunctionCall fc, Function f, int n | expSrc = fc.getArgument(n) and expDest = fc.getArgument(0) and f = fc.getTarget() and ( (f.hasName("sprintf") and n >= 1) or (f.hasName("snprintf") and n >= 2) ) ) }

…

  override predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) {
    isTaintedString(node1.asExpr(), node2.asExpr())
  }

And defined the sink as the path argument to a file open function:

class FileOpenFunction extends Function { FileOpenFunction() { this.hasName("fopen") or this.hasName("open") }

int getPathParameter() { result = 0 } // filename parameter index
}

…

  override predicate isSink(DataFlow::Node sink) {
    exists(FunctionCall fc, FileOpenFunction fof |
      fc.getTarget() = fof and fc.getArgument(fof.getPathParameter()) = sink.asExpr())
  }

With some foreknowledge of how our target device worked, observed from an initial review, we expected at least some results before we tackled the next problem of excluding flows where the data was validated, as with the earlier queries. However, the query returned nothing at all.

With no data flow paths to examine, we fell back on querying the function call graph to search for any path between the message handler functions and a call to a file open function, excluding calls where the path argument was a constant:

// this recursive predicate defines a function call graph predicate mayCallFunction(Function caller, FunctionCall fc) { fc.getEnclosingFunction() = caller or mayCallFunction(fc.getTarget(), fc) } from MessageHandlerFunction mhf, FunctionCall fc, FileOpenFunction fof where mayCallFunction(mhf, fc) and fc.getTarget() = fof and not fc.getArgument(fof.getPathParameter()).isConstant() select mhf, "$@ may have a path to $@", mhf, mhf.toString(), fc, fc.toString()

This query provided 5 results – sufficiently few to examine manually – and from this we uncovered 2 path traversal vulnerabilities, one in writing to a file and one in reading from a file, both with an attacker-supplied path. It turned out that the taint tracking didn’t flag these because it required two separate message types to be sent: the first to set the filename, and the second to read or write data to the file with that name. Fortunately, QL was flexible enough to permit an alternative route of exploration.

Conclusions

At Microsoft, we take a defense in depth approach to securing the cloud and keeping our customers’ data safe. An important part of this is performing comprehensive security reviews of Azure internal attack surfaces. In this source code review of an embedded device, we applied the advanced static analysis techniques of Semmle QL to finding vulnerabilities in a message-based management protocol. This uncovered a total of 33 vulnerable message handlers, within a variety of bug classes. Using QL enabled us to automate the repetitive parts of what would otherwise be an entirely manual code review, while still applying an explorative approach.

Steven Hunter and Christopher Ertl, MSRC Vulnerabilities & Mitigations team

↧

Time travel debugging: It’s a blast! (from the past)

May 29, 2019, 10:33 am

≫ Next: We need a safer systems programming language

≪ Previous: Vulnerability hunting with Semmle QL, part 2

The Microsoft Security Response Center (MSRC) works to assess vulnerabilities that are externally reported to us as quickly as possible, but time can be lost if we have to confirm details of the repro steps or environment with the researcher to reproduce the vulnerability. Microsoft has made our “Time Travel Debugging” (TTD) tool publicly available to make it easy for security researchers to provide full repro, shortening investigations and potentially contributing to higher bounties (see “Report quality definitions for Microsoft’s Bug Bounty programs”). We use it internally, too—it has allowed us to find root cause for complex software issues in half the time it would take with a regular debugger.

If you’re wondering where you can get the TTD tool and how to use it, this blogpost is for you.

Understanding time travel debugging

Whether you call it “Timeless debugging”, “record-replay debugging”, “reverse-debugging”, or “time travel debugging”, it’s the same idea: the ability to record the execution of a program. Once you have this recording, you can navigate forward or backward, and you can share with colleagues. Even better, an execution trace is a deterministic recording; everybody looking at it sees the same behavior at the same time. When a developer receives a TTD trace, they do not even need to reproduce the issue to travel in the execution trace, they can just navigate through the trace file.

There are usually three key components associated to time travel debugging:

A recorder that you can picture as a video camera,
A trace file that you can picture as the recording file generated by the camera,
A replayer that you can picture as a movie player.

Good ol’ debuggers

Debuggers aren’t new, and the process of debugging an issue has not drastically changed for decades. The process typically works like this:

Observing the behavior under a debugger. In this step, you recreate an environment like that of the finder of the bug. It can be as easy as running a simple proof-of-concept program on your machine and observing a bug-check, or it can be as complex as setting up an entire infrastructure with specific software configurations just to be able to exercise the code at fault. And that’s if the bug report is accurate and detailed enough to properly set up the environment.
Understanding why the issue happened. This is where the debugger comes in. What you expect of a debugger regardless of architectures and platforms is to be able to precisely control the execution of your target (stepping-over, stepping-in at various granularity level: instruction, source-code line), setting breakpoints, editing the memory as well as editing the processor context. This basic set of features enables you to get the job done. The cost is usually high though. A lot of reproducing the issue over and over, a lot of stepping-in and a lot of “Oops... I should not have stepped-over, let’s restart”. Wasteful and inefficient.

Whether you’re the researcher reporting a vulnerability or a member of the team confirming it, Time Travel Debugging can help the investigation to go quickly and with minimal back and forth to confirm details.

High-level overview

The technology that Microsoft has developed is called “TTD” for time-travel debugging. Born out of Microsoft Research around 2006 (cf “Framework for Instruction-level Tracing and Analysis of Program Executions”) it was later improved and productized by Microsoft’s debugging team. The project relies on code-emulation to record every event necessary that replay will need to reproduce the exact same execution. The exact same sequence of instructions with the exact same inputs and outputs. The data that the emulator tracks include memory reads, register values, thread creation, module loads, etc.

Recording / Replaying

The recording software CPU, TTDRecordCPU.dll, is injected into the target process and hijacks the control flow of the threads. The emulator decodes native instructions into an internal custom intermediate language (modeled after simple RISC instructions), caches block, and executes them. From now on, it carries the execution of those threads forward and dispatches callbacks whenever an event happens such as: , when an instruction has been translated, etc. Those callbacks allow the trace file writer component to collect information needed for the software CPU to replay the execution based off the trace file.

The replay software CPU, TTDReplayCPU.dll shares most of the same codebase than the record CPU, except that instead of reading the target memory it loads data directly from the trace file. This allows you to replay with full fidelity the execution of a program without needing to run the program.

The trace file

The trace file is a regular file on your file system that ends with the ‘run’ extension. The file uses a custom file format and compression to optimize the file size. You can also view this file as a database filled with rich information. To access information that the debugger requires very fast, the “WinDbg Preview” creates an index file the first time you open a trace file. It usually takes a few minutes to create. Usually, this index is about one to two times as large as the original trace file. As an example, a tracing of the program ping.exe on my machine generates a trace file of 37MB and an index file of 41MB. There are about 1,973,647 instructions (about 132 bits per instruction). Note that, in this instance, the trace file is so small that the internal structures of the trace file accounts for most of the space overhead. A larger execution trace usually contains about 1 to 2 bits per instruction.

Recording a trace with WinDbg Preview

Now that you’re familiar with the pieces of TTD, here’s how to use them.

Get TTD: TTD is currently available on Windows 10 through the “WinDbg Preview” app that you can find in the Microsoft store: https://www.microsoft.com/en-us/p/windbg-preview/9pgjgd53tn86?activetab=pivot:overviewtab.

Once you install the application the “Time Travel Debugging - Record a trace” tutorial will walk you through recording your first execution trace.

Building automations with TTD

A recent improvement to the Windows debugger is the addition of the debugger data model and the ability to interact with it via JavaScript (as well as C++). The details of the data model are out of scope for this blog, but you can think of it as a way to both consume and expose structured data to the user and debugger extensions. TTD extends the data model by introducing very powerful and unique features available under both the @$cursession.TTD and @$curprocess.TTD nodes.

TTD.Calls is a function that allows you to answers questions like “Give me every position where foo!bar has been invoked” or “Is there a call to foo!bar that returned 10 in the trace”. Better yet, like every collection in the data-model, you can query them with LINQ operators. Here is what a TTD.Calls object look like:


0:000> dx @$cursession.TTD.Calls("msvcrt!write").First()
@$cursession.TTD.Calls("msvcrt!write").First()
    EventType        : Call
    ThreadId         : 0x194
    UniqueThreadId   : 0x2
    TimeStart        : 1310:A81 [Time Travel]
    TimeEnd          : 1345:14 [Time Travel]
    Function         : msvcrt!_write
    FunctionAddress  : 0x7ffec9bbfb50
    ReturnAddress    : 0x7ffec9be74a2
    ReturnValue      : 401
    Parameters

The API completely hides away ISA specific details, so you can build queries that are architecture independent.

TTD.Calls: Reconstructing stdout

To demo how powerful and easy it is to leverage these features, we record the execution of “ping.exe 127.0.0.1” and from the recording rebuild the console output.

Building this in JavaScript is very easy:

Iterate over every call to msvcrt!write ordered by the time position,
Read several bytes (the amount is in the third argument) pointed by the second argument,
Display the accumulated results.


'use strict';
function initializeScript() {
    return [new host.apiVersionSupport(1, 3)];
}
function invokeScript() {
    const logln = p => host.diagnostics.debugLog(p + '\n');
    const CurrentSession = host.currentSession;
    const Memory = host.memory;
    const Bytes = [];
    for(const Call of CurrentSession.TTD.Calls('msvcrt!write').OrderBy(p => p.TimeStart)) {
        Call.TimeStart.SeekTo();
        const [_, Address, Count] = Call.Parameters;
        Bytes.push(...Memory.readMemoryValues(Address, Count, 1));
    }
    logln(Bytes.filter(p => p != 0).map(
        p => String.fromCharCode(p)
    ).join(''));
}

TTD.Memory: Finding every thread that touched the LastErrorValue

TTD.Memory is a powerful API that allows you to query the trace file for certain types (read, write, execute) of memory access over a range of memory. Every resulting object of a memory query looks like the sample below:


0:000> dx @$cursession.TTD.Memory(0x000007fffffde068, 0x000007fffffde070, "w").First()
@$cursession.TTD.Memory(0x000007fffffde068, 0x000007fffffde070, "w").First()
    EventType        : MemoryAccess
    ThreadId         : 0xb10
    UniqueThreadId   : 0x2
    TimeStart        : 215:27 [Time Travel]
    TimeEnd          : 215:27 [Time Travel]
    AccessType       : Write
    IP               : 0x76e6c8be
    Address          : 0x7fffffde068
    Size             : 0x4
    Value            : 0x0

This result identifies the type of memory access done, the time stamp for start and finish, the thread accessing the memory, the memory address accessed, where it has been accessed and what value has been read/written/executed.

To demonstrate its power, let’s create another script that collects the call-stack every time the application writes to the LastErrorValue in the current thread’s environment block:

Iterate over every memory write access to &@$teb->LastErrorValue,
Travel to the destination, dump the current call-stack,
Display the results.


'use strict';
function initializeScript() {
    return [new host.apiVersionSupport(1, 3)];
}
function invokeScript() {
    const logln = p => host.diagnostics.debugLog(p + '\n');
    const CurrentThread = host.currentThread;
    const CurrentSession = host.currentSession;
    const Teb = CurrentThread.Environment.EnvironmentBlock;
    const LastErrorValueOffset = Teb.targetType.fields.LastErrorValue.offset;
    const LastErrorValueAddress = Teb.address.add(LastErrorValueOffset);
    const Callstacks = new Set();
    for(const Access of CurrentSession.TTD.Memory(
        LastErrorValueAddress, LastErrorValueAddress.add(8), 'w'
    )) {
        Access.TimeStart.SeekTo();
        const Callstack = Array.from(CurrentThread.Stack.Frames);
        Callstacks.add(Callstack);
    }
    for(const Callstack of Callstacks) {
        for(const [Idx, Frame] of Callstack.entries()) {
            logln(Idx + ': ' + Frame);
        }
        logln('----');
    }
}

Note that there are more TTD specific objects you can use to get information related to events that happened in a trace, the lifetime of threads, so on. All of those are documented on the “Introduction to Time Travel Debugging objects” page.


0:000> dx @$curprocess.TTD.Lifetime
@$curprocess.TTD.Lifetime                 : [F:0, 1F4B:0]
    MinPosition      : F:0 [Time Travel]
    MaxPosition      : 1F4B:0 [Time Travel]
0:000> dx @$curprocess.Threads.Select(p => p.TTD.Position)
@$curprocess.Threads.Select(p => p.TTD.Position)
    [0x194]          : 1E21:104 [Time Travel]
    [0x7e88]         : 717:1 [Time Travel]
    [0x5fa4]         : 723:1 [Time Travel]
    [0x176c]         : B58:1 [Time Travel]
    [0x76a0]         : 1938:1 [Time Travel]

Wrapping up

Time Travel Debugging is a powerful tool for security software engineers and can also be beneficial for malware analysis, vulnerability hunting, and performance analysis. We hope you found this introduction to TTD useful and encourage you to use it to create execution traces for the security issues that you are finding. The trace files generated by TTD compress very well; we recommend to use 7zip (usually shrinks the file to about 10% of the original size) before uploading it to your favorite file storage service.

Axel Souchet

Microsoft Security Response Center (MSRC)

FAQ

Can I edit memory during replay time?

No. As the recorder only saves what is needed to replay a particular execution path in your program, it doesn’t save enough information to be able to re-simulate a different execution.

Why don’t I see the bytes when a file is read?

The recorder knows only what it has emulated. Which means that if another entity (the NT kernel here but it also could be another process writing into a shared memory section) writes data to memory, there is no way for the emulator to know about it. As a result, if the target program never reads those values back, they will never appear in the trace file. If they are read later, then their values will be available at that point when the emulator fetches the memory again. This is an area the team is planning on improving soon, so watch this space .

Do I need private symbols or source code?

You don’t need source code or private symbols to use TTD. The recorder consumes native code and doesn’t need anything extra to do its job. If private symbols and source codes are available, the debugger will consume them and provide the same experience as when debugging with source / symbols.

Can I record kernel-mode execution?

TTD is for user-mode execution only.

Does the recorder support self-modifying code?

Yes, it does!

Are there any known incompatibilities?

There are some and you can read about them in “Things to look out for”.

Do I need WinDbg Preview to record traces?

Yes. As of today, the TTD recorder is shipping only as part of “WinDbg Preview” which is only downloadable from the Microsoft Store.

References

Time travel debugging

Time Travel Debugging - Overview - https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/time-travel-debugging-overview
Time Travel Debugging: Root Causing Bugs in Commercial Scale Software -https://www.youtube.com/watch?v=l1YJTg_A914
Defrag Tools #185 - Time Travel Debugging – Introduction - https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-185-Time-Travel-Debugging-Introduction
Defrag Tools #186 - Time Travel Debugging – Advanced - https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-186-Time-Travel-Debugging-Advanced
Time Travel Debugging and Queries – https://github.com/Microsoft/WinDbg-Samples/blob/master/TTDQueries/tutorial-instructions.md
Framework for Instruction-level Tracing and Analysis of Program Executions - https://www.usenix.org/legacy/events/vee06/full_papers/p154-bhansali.pdf
VulnScan – Automated Triage and Root Cause Analysis of Memory Corruption Issues - https://blogs.technet.microsoft.com/srd/2017/10/03/vulnscan-automated-triage-and-root-cause-analysis-of-memory-corruption-issues/
What’s new in WinDbg Preview - https://mybuild.techcommunity.microsoft.com/sessions/77266

Javascript / WinDbg / Data model

WinDbg Javascript examples - https://github.com/Microsoft/WinDbg-Samples
Introduction to Time Travel Debugging objects - https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/time-travel-debugging-object-model
WinDbg Preview - Data Model - https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/windbg-data-model-preview

↧

We need a safer systems programming language

July 18, 2019, 12:57 pm

≫ Next: Why Rust for safe systems programming

≪ Previous: Time travel debugging: It’s a blast! (from the past)

In our first post in this series, we discussed the need for proactively addressing memory safety issues. Tools and guidance are demonstrably not preventing this class of vulnerabilities; memory safety issues have represented almost the same proportion of vulnerabilities assigned a CVE for over a decade. We feel that using memory-safe languages will mitigate this …

We need a safer systems programming language Read More »

↧

Why Rust for safe systems programming

July 22, 2019, 12:19 pm

≫ Next: Microsoft Announces Top Contributing Partners in the Microsoft Active Protections Program (MAPP)

≪ Previous: We need a safer systems programming language

In this series, we have explored the need for proactive measures to eliminate a class of vulnerabilities and walked through some examples of memory safety issues we’ve found in Microsoft code that could have been avoided with a different language. Now we’ll peek at why we think that Rust represents the best alternative to C …

Why Rust for safe systems programming Read More »

↧

Microsoft Announces Top Contributing Partners in the Microsoft Active Protections Program (MAPP)

July 25, 2019, 2:51 pm

≫ Next: It’s Official – The Way We Recognize Our Security Researchers

≪ Previous: Why Rust for safe systems programming

Today we announce the top organizational candidates for Vulnerability Top Contributors, Threat Indicator Top Submitters, and Zero-Day Top Reporting for the period of July 1, 2018 – June 30, 2019. The Microsoft Active Protections Program provides security and protection to customers through cooperation and collaboration with industry leading partners. This bi-directional sharing program of threat …

Microsoft Announces Top Contributing Partners in the Microsoft Active Protections Program (MAPP) Read More »

↧

It’s Official – The Way We Recognize Our Security Researchers

July 29, 2019, 12:34 pm

≫ Next: Meet the MSRC at Black Hat 2019

≪ Previous: Microsoft Announces Top Contributing Partners in the Microsoft Active Protections Program (MAPP)

We deeply appreciate the partnership of the many talented security researchers who report vulnerabilities to Microsoft through Coordinated Vulnerability Disclosure. We pay bounties for research in key areas, and each year at Black Hat USA, we’ve recognized the most impactful researchers helping to protect the ecosystem. That’s not changing; we’re continuing to expand our bounty …

It’s Official – The Way We Recognize Our Security Researchers Read More »

↧

Meet the MSRC at Black Hat 2019

July 29, 2019, 4:58 pm

≫ Next: Recognizing Security Researchers in 2019

≪ Previous: It’s Official – The Way We Recognize Our Security Researchers

We’re getting close to Black Hat, and we hope to see you there. Here’s where you can find members of the Microsoft Security Response Center if you’d like to say hello, ask a question about a report you made, discuss a recent blog article, or just show us pictures of your dog. Wednesday, August 7 …

Meet the MSRC at Black Hat 2019 Read More »

↧

Recognizing Security Researchers in 2019

July 30, 2019, 2:29 pm

≫ Next: Azure Security Lab: a new space for Azure research and collaboration

≪ Previous: Meet the MSRC at Black Hat 2019

Who’s going to be on the Most Valuable Security Researcher list at Black Hat USA 2019? We’re not announcing the names—yet—but this is how we’ll determine who’s there. How do we define the Most Valuable Security Researchers? The list at Black Hat will be the top tier of researchers based on not just the volume …

Recognizing Security Researchers in 2019 Read More »

↧

Azure Security Lab: a new space for Azure research and collaboration

August 5, 2019, 9:05 am

≫ Next: Corporate IoT – a path to intrusion

≪ Previous: Recognizing Security Researchers in 2019

Azure is exceptionally secure. To help keep it that way, we are doubling the top bounty reward for Azure vulnerabilities to $40,000. But we aren’t stopping there. To make it easier for security researchers to confidently and aggressively test Azure, we are inviting a select group of talented individuals to come and do their worst …

Azure Security Lab: a new space for Azure research and collaboration Read More »

↧

Corporate IoT – a path to intrusion

August 5, 2019, 9:27 am

≫ Next: Announcing 2019 MSRC Most Valuable Security Researchers

≪ Previous: Azure Security Lab: a new space for Azure research and collaboration

Several sources estimate that by the year 2020 some 50 billion IoT devices will be deployed worldwide. IoT devices are purposefully designed to connect to a network and many are simply connected to the internet with little management or oversight. Such devices still must be identifiable, maintained, and monitored by security teams, especially in large …

Corporate IoT – a path to intrusion Read More »

↧

Announcing 2019 MSRC Most Valuable Security Researchers

August 7, 2019, 11:30 am

≫ Next: Microsoft Announces Top Three Contributing Partners in the Microsoft Active Protections Program (MAPP)

≪ Previous: Corporate IoT – a path to intrusion

Earlier today we announced MSRC’s 2018-2019 Most Valuable Security Researchers at Black Hat. The following 75 researchers hail from all corners of the world and possess varied experience and skills, yet all of them have contributed to securing the Microsoft’s customers and the broader ecosystem. For over a decade, one of Microsoft’s partners in vulnerability …

Announcing 2019 MSRC Most Valuable Security Researchers Read More »

↧

Microsoft Announces Top Three Contributing Partners in the Microsoft Active Protections Program (MAPP)

August 8, 2019, 8:45 pm

≫ Next: August 2019 Security Updates

≪ Previous: Announcing 2019 MSRC Most Valuable Security Researchers

Today Microsoft announced the MAPP program Top Vulnerability Contributors, Top Threat Indicator Submitters, and Top Zero-Day Reporting for the period of July 1, 2018 – June 30, 2019. The Microsoft Active Protections Program provides security and protection to customers through cooperation and collaboration with industry leading partners. While all MAPP partners have made a significant …

Microsoft Announces Top Three Contributing Partners in the Microsoft Active Protections Program (MAPP) Read More »

↧