The project we present here encompasses many months of dedicated research by diversenok_zero. It covers a wide range of concealed code execution techniques and investigates the related internal mechanisms that make them possible on Windows systems in the first place. The activities throughout this research included documenting the technique's functioning, preparing sample implementations, and observing their side effects.
The sample implementations we provide can serve as a testing toolkit that security engineers can use to challenge existing detection logic. These tools fall into two categories: code injection that attempts to mask additional payloads within legitimate running applications and process tampering that starts arbitrary programs while borrowing the identity of trusted components.
Furthermore, the third portion of the project focuses exclusively on discussing detection mechanisms that can identify the use of concealed code execution techniques. It gradually follows through three subsequent stages of detection: static, runtime, and forensic. The material offers various suggestions, provides YARA rules, and demonstrates successful detection using custom experimental tooling (also included).
In short, the following values are present throughout the research:
You can find the code and the detailed explanations at our GitHub repository:
Code available at:
Process Tampering techniques exploit various intricate aspects of operating system's functioning to violate assumptions made by security software and conceal code execution on the scale of an entire process. The famous examples of such techniques include Process Hollowing, Process Doppelganging, Process Herpaderping, and more.
In the broader sense, code injection allows performing operations from the context of another component, typically without interfering with its functionality. Such techniques are interesting from the security perspective because they allow attackers to blend-in additional payloads within existing logic, making it harder to distinguish between them. In this write-up, we analyze the fundamental mechanisms used during code injection and try to understand the underlying decision-making that might accompany the development of future techniques.
The techniques for concealing code execution discussed in the offensive parts of the project have a shared primary objective: they attempt to fool security software and bypass security policies. Identifying something that actively hides its presence can prove challenging due to the inherent limitations it imposes on the detection methods. Luckily for defenders, most evasion and concealment techniques share common flaws that allow hunting for their entire families. This section describes the following topics: static detection, runtime detection, and forensic detection.
The primary focus of the research project was on making the nitty gritty details behind these techniques easy to understand & reproduce to enable more people to develop robust detections.
Process Tampering is a set of techniques that exploit various intricate aspects of the operating system's functioning to violate assumptions made by security software and conceal code execution on the scale of an entire process. The famous examples of such techniques include Process Hollowing, Process Doppelgänging, Process Herpaderping, and more.
Before diving into the technical details, let's familiarize ourselves with the main terms and concepts. First, we have programs - binary files that include the code and describe the dependencies it requires, such as the OS functions it needs to use. Secondly, we have processes, which represent running instances of applications. Despite commonly referring to them as running on the system, processes never execute any code; they are merely containers that encapsulate runtime resources such as the memory and references to actively used OS primitives. Code execution is the responsibility of one or more threads that belong to a process and share all its resources. Technically, it is also possible to have a process with no threads under some circumstances; we will see how it comes in handy for some techniques later.
Every process includes a private virtual address space that encompasses all addressable memory. It contains regions with different memory protection (read/write/execute) and purposes (storing data or code). We will discuss the initial memory layout for newly created processes shortly.
Windows includes multiple ways to start processes, ranging from high-level Shell API that only asks for a filename to low-level Native API that might require manually parsing the binary and creating the initial thread. But despite this diversity, everything boils down to two system calls.
All officially documented functions eventually end up either in
CreateProcessAsUser. Some higher-level APIs call them directly; others require COM/RPC to forward the request to a dedicated OS component (such as AppInfo, WMI, Secondary Logon services, or even Windows Explorer) that invokes these functions on the caller's behalf.
CreateProcess is quite a complex function by itself, most of the logic we are interested in is implemented inside the underlying
NtCreateUserProcess syscall. Nonetheless, aside from relying on the Native API structures and being designed accordingly,
NtCreateUserProcess is similar to
CreateProcess. The function takes a filename, some parameters, and an array of attributes on input and produces a ready-to-run process with the initial thread created for us on output. Let's take a closer look at the main structures we can find in the address space of a newly created process:
Initially, every process has only two executable images mapped into its address space: the program itself and ntdll.dll (the base library that exposes Native API to user-mode code).
Technically, there is an exception to this rule: 32-bit applications running under WoW64 emulation on 64-bit systems additionally get a 32-bit ntdll.
Other dependencies (kernel32.dll, shell32.dll, etc.) get loaded later on demand by the initial thread when the module loader (implemented in ntdll) starts resolving the program's dependencies. Aside from the executable code, you will also find these several fundamental structures in every process:
PEB, or Process Environment Block is a memory region that stores and cross-references internal process-wide information, including process parameters, heaps, and the list of loaded modules.
RTL_USER_PROCESS_PARAMETERS- a structure that controls the command-line arguments, environment variables, appearance settings, and other configuration options.
CreateProcessconstructs most of its fields from
STARTUPINFO, hence the similarity.
TEB, or Thread Environment Block is another essential data structure that stores virtually all per-thread configuration. Several well-known functions like
GetLastErrorretrieve information from the corresponding fields in TEB.
Processes created via
NtCreateUserProcess (and, thus,
CreateProcess, plus all higher-level functions) always include at least one thread. This thread will have a TEB and a stack somewhere within the address space.
Process creation we've seen so far isn't inherently complex (from the programmer's perspective) because
NtCreateUserProcess is a relatively recent addition introduced to simplify it. Before Windows Vista, however,
CreateProcess used to rely on a different syscall -
NtCreateProcessEx. This legacy function is still available in modern versions of Windows, so let's take a look at a small comparison of their functionality:
|Means to specify the executable:||Filename (in NT format)||Handle to an image section|
|Creates the initial thread?||Yes||No|
|Immediately notifies drivers about process creation?||Yes||No|
The first and foremost difference between these syscalls is that
NtCreateProcessEx does the bare minimum compared to
NtCreateUserProcess. Not only does it create a process with no threads, but it also doesn't even allocate
RTL_USER_PROCESS_PARAMETERS nor understand filenames.
As a result, dealing with this function is significantly more cumbersome because of all the additional operations it requires. Here is a rough recipe of how to re-implement the modern
NtCreateUserProcess using the legacy
Open the executable file for the new process.
NtCreateProcessEx doesn't understand filenames, so the first step in the preparation is to call
Create the process object using
NtCreateProcessEx. The new process will still get a mapped copy of the executable, ntdll, and PEB in its address space, but that's about it.
Allocate and copy process parameters block into the target. The simplest way is to use the
RtlCreateProcessParameters helper locally and then transfer the resulting structure into the new process via
NtWriteVirtualMemory. Remember that since process parameters contain pointers, they need to be adjusted correspondingly to be valid within the target.
Adjust the corresponding process parameters pointer in the target's PEB using
Map the section locally for parsing; determine its entry point address and stack parameters for the initial thread. It might be tempting to query
NtQueryInformationProcess instead, but, unfortunately, this approach does not account for mandatory ASLR under some conditions. The most reliable way is to manually parse the PE headers.
Create the initial thread. Previously, this step would require manually preparing TEB and allocating the stack, but luckily, nowadays, we can let
NtCreateThreadEx do everything for us.
Anyway, as for the technical details, a carefully written implementation is always available for review in this repository. Still, you might wonder why anybody would use this overly-engineered approach. Here are a few key points:
Mechanisms that enforce security policy always need some contextual information about the caller. Windows kernel usually relies on access tokens in its security-related decision-making because these objects allow attributing activity to a particular user and determining their privileges. In some cases, however, various components also heavily depend on recognizing the identity of the process that attempts to perform a specific operation. For instance, per-application firewall rules and application control policies like AppLocker and Software Restriction Policies cannot function without the underlying mechanisms for process identification. But probably the most impactful example is anti-malware detection logic used by AV and EDR software, which often behaves differently based on the process to which it's applied. These differences can take a form ranging from as simple as explicit exceptions from the rules to reputation-based multipliers. Faking process identity can be a powerful tool in the hands of an attacker, allowing to defeat entire families of security mechanisms at once.
The name and path to the executable are the primary pieces of identifying information about a process. The filenames are easy to work with; plus, knowing them allows further inspecting the application's files, computing hashes, verifying digital signatures, and much more. On the low level, however, the connection between the process and the file on the disk is not as straightforward as it might seem. There are several ways of retrieving the filename which might yield contradicting results. Some methods might provide outdated information, others can fail to query it after targeted tampering, and a few can even show arbitrary values of the attacker's choice. You can find a detailed overview of these methods in the detection write-up, where we will hunt for a mismatch between these values to uncover the use of process tampering.
Here are a few tricks and techniques that can introduce a name mismatch or otherwise confuse software that relies on retrieving filenames for processes, all without changing the content of the files:
Renaming. It might sound silly, but in some overly-simplified attack scenarios, it might be enough to rename the file before and after spawning the process from it. Of course, it won't fool security software that scans the executable synchronously with process creation, but it might circumvent mechanisms that check the filename later on demand. Such checks include looking up the short name from the process snapshot, querying the Native process image name, or reading it from the target's memory. Additionally to renaming, the attacker might also plant a different file under the previously used path, further increasing confusion when a human inspection is involved. Aside from using APIs that show outdated information, many tools (Task Manager, for example) cache filenames and, therefore, become unable to navigate to the correct executable after renaming.
Deleting. Deleting running executables is often considered impossible because the system locks files mapped into memory. There is, however, an edge case in the rules (discovered by Jonas Lyk) that allows doing so on NTFS volumes due to the logic behind alternative stream handling. While the system disallows deletion and modification of the locked primary stream, it doesn't prevent its renaming to an alternative stream. Because files must always have the primary (
::$DATA) stream, moving the content into an alternative stream creates an unlocked primary stream. Deleting this new stream, by definition, deletes the entire file and bypasses the lock on the alternative stream. At this point, every API that tracks renames starts returning errors because the active link between the process and the file is gone. The attacker might want to re-create the file and exploit the filename caching issues discussed above, but it won't restore the connection to the original process.
Unlocking the executable. The previous item prevents security software from using rename-tracking APIs to locate the executable later during the process's lifetime, but it sacrifices the file we might want to keep. Instead of deleting it, we can duplicate the original stream, move the locked stream out of the way, delete it, and then replace the newly unlocked empty primary stream with the duplicate. As a result, the file effectively remains unchanged but becomes available for modification. Of course, the subsequent changes to the content do not affect the existing memory mappings that become disassociated from the underlying file. The functions that retrieve the identity of the process and track renames also start to fail for a similar reason. Note that this stream rotation trick is superior to deleting and re-creating the file because it preserves all of its attributes, the file ID, hard links, and other properties. The UnlockExe tool included with this repository implements this original technique.
Process Ghosting. The previous techniques might bypass some functions, but they fail to completely hide the filename because of the information that the system captures during process creation. Additionally, many security products opt-in to verifying the file synchronously, i.e., before we can tamper with its name. That's where the architectural decisions of
NtCreateProcess come into play and benefit attackers. This syscall uses section objects (aka. memory mappings) which, as we know from the discussions above, can lose their association with the files on disk, effectively anonymizing it. When it happens, the system cannot query the name and establish the identity of the new process from the start of its existence, leading to some peculiar results. The most noticeable effect is the empty string that the system uses for the short image name in the process snapshot. Other identification methods either fail or return NULL. Notably, the caller still needs to provide some filename in the
ImagePathName field of
RTL_USER_PROCESS_PARAMETERS, but this string can point to an arbitrary file of choice.
The classical implementation of Process Ghosting works as follows: it creates a temporary file, marks it for pending deletion, creates an image section from it, closes the file (initiating deletion), and then creates a process from it. The demo provided with this repository follows the same logic. It is also possible to avoid using a temporary file and implement this technique utilizing the filename dissociation trick from the previous example (the UnlockExe demo); we leave this task as an exercise to the reader.
Process Hollowing. The process's identity is tight to the executable file used during its creation, not the content of the memory. And while there is a connection between those two concepts at the beginning of the process's lifetime, subsequent changes might invalidate this correlation. Process Hollowing exploits this subtle difference: it converts a trusted process into an empty husk and then uses it to execute a different binary. At the bare minimum, this technique spawns a new suspended process, maps the payload in its address space, adjusts the image base address in the Process Environment Block, changes the start address of the initial thread, and then resumes the target's execution. After the execution starts, the code in ntdll proceeds with the same initialization sequence as usual: it retrieves the image base address from PEB, resolves the imports for the module located there, and jumps to the provided entry point address. Because this type of tampering preserves the connection with the original file and happens after the system notifies the security software, products that don't inspect the memory (or at least its layout and properties) have a hard time detecting this technique. Due to its simplicity, the absence of file I/O, and small compatibility impact, Process Hollowing becomes a more formidable option than the previously discussed approaches. We cover this technique in the name mismatch category (and not the content mismatch discussed in the upcoming section) because it doesn't modify any files or code and always introduces a mismatch between the filename used during creation and the file mapped at the image base. Notably, existing implementations deviate in how they map the payload: using an image section that mimics the type of memory used for legitimate purposes or a private executable region where the attacker needs to copy the PE file according to its layout. The demo implementation provided with this repository uses the former approach, but keep in mind that both options are viable and have pros and cons.
One of the core features in Windows that links memory management and file I/O is the ability to project files into an address space of a specific process. Aside from offering a remarkably convenient and optimized instrument for accessing data on the disk via mere memory operations, it also powers several other fundamental primitives, including the one that provides processes with most of their executable memory. Typically, Memory Manager keeps all file projections in sync with the backing storage. However, it also supports the copy-on-write semantics where multiple views derived from the same source can have different content because all modifications remain local. The images that process creation and DLL loading operations map into processes also utilize this mode, allowing them to combine the best of both worlds. Sharing memory by default drastically reduces its footprint, while making a private copy on any modification guarantees adequate isolation and sustains stability.
Having a private copy implies that the attacker that modifies mapped image memory doesn't leave any traces for security software that only looks into the files that stay untouched. However, such techniques generally fall into the category of code injection. What process tampering can offer is a bit more exotic because it originates from the opposite direction and describes what happens if we somehow manage to alter the underlying executable file. The answer might initially sound dull - the memory doesn't change, but it opens up the possibility of introducing content mismatch without ever modifying memory.
Process Doppelgänging. This technique uses an elegant approach that relies on the transactional filesystem (TxF). Transactions group I/O operations, providing a few convenient guarantees such as atomicity, consistency, isolation, and durability (see ACID). Because of the isolation, transactions effectively become a mechanism that allows a file to be in two states simultaneously. When we modify files in place, these two alternative views might share the filename and disagree on the content. This ambiguity also affects memory projections constructed from such files because they follow similar transactional isolation boundaries. In the end, that's what Process Doppelgänging does: it chooses an existing executable file, overwrites its content within a temporary transaction, prepares an image section from the transacted view (and, thus, containing the modified bytes), and creates a process from it. For the world outside of the transaction, the file remains untouched. Therefore, when a naïve security product observes the process creation event, it looks up the corresponding file, opens it outside the transaction, verifies the initial content, and doesn't raise the alarm. At the same time, the target's address space holds an altered version that merely hides inside the transaction under the same filename.
Interestingly, the original implementation rolls the transaction back immediately after creating the initial thread. This operation has a similar effect to deleting the underlying file since it disassociates the filename from the section object. Hence, it introduces artifacts that can give away the use of this technique to forensic detection tools. Of course, it happens after most anti-malware services inspect the process during its creation, but still. The demo provided with this repository supports the classical and a revisited approach that keeps the transaction alive for the entire process's lifetime. Comparing how these two alternatives behave against various detections is left as an exercise to the reader.
Process Herpaderping. This relatively recent technique utilizes a simple yet effective alternative solution for achieving a file content mismatch. Because
NtCreateProcess requires the caller to open the executable file manually, we can use this opportunity to request write access to the file - something that becomes impossible because of the locking rules once we create the section object. After obtaining this handle, we can backup the original file content and overwrite it with the payload. Then we prepare the image section (which reads and caches our payload) and restore the file's content to its original state. Despite the rollback, the section object remains using the payload data. Finally, we close the file and proceed with the typical low-level process creation (using the altered section object). Notably, because we never delete the file or revert transactions, the section (and, therefore, the process) never loses association with the original file. Even security products aware of Process Doppelgänging won't notice anything suspicious about the new process unless they perform a full comparison of the code in memory to the data on the disk. As with other techniques, you'll find the demo implementation in the corresponding folder.
Concealing code execution on the scale of an entire process is an advanced approach for evading security software. Thus, detecting process tampering is a peculiar subject, notable for requiring specially designed tools that pay attention to detail. You can find more information on this topic in the corresponding section of the project.
In the broader sense, code injection is a process that allows performing operations from the context of another component, typically without interfering with its functionality. Such techniques are interesting from the security perspective because they allow attackers to blend-in additional payloads within existing logic, making it harder to distinguish between them. In this write-up, we analyze the fundamental mechanisms used during code injection and try to understand the underlying decision-making that might accompany the development of future techniques.
The ability to inject code into running processes can be beneficial or essential, depending on the goals in mind. While a discussion of the reasons to perform code injection is out of the scope of the project, here are a few possible answers to this question:
Depending on the complexity of the task and other requirements, we might choose one of the two primary formats for hosting our code:
To begin with, the processor needs a correct sequence of machine instructions available in the target address space before it can execute them. Before the era of Data Execution Prevention, processors were not distinguishing code from data, so as long as the bytes in memory had a valid interpretation as machine instructions, they were free to run. On modern systems, however, the memory protection model defines which regions are readable, writable, and executable, preventing attackers from redirecting execution into the data areas that might include untrusted input.
Thus, we have two options: either leverage existing code from the program itself or system libraries or allocate a new executable region. The second approach generally provides more flexibility, though, is significantly easier to detect.
Additionally, it can be challenging (if possible at all) to find a complete piece of code that achieves a specific non-trivial goal. There is, however, a family of advanced techniques that allow building an algorithm of any complexity given just a few simple sequences of instructions called gadgets. The most well-known are Return Oriented Programming (ROP) and Jump Oriented Programming (JOP). Notably, modern versions of Windows running on compatible hardware effectively prevent ROP and significantly complicate JOP, so we will not cover gadget-based techniques in this project.
Almost all programs depend at a minimum on several system libraries that expose Win32 API functions. To optimize memory usage, Windows utilizes a mechanism called KnownDLLs that includes a few dozen of the most frequently used built-in libraries and ensures that their memory is always shared. While their exact location is still randomized and does not persist between reboots, these libraries are mapped at the same base address across all processes. This fact drastically simplifies injection because when it comes to KnownDLLs, we don't need to resolve any remote addresses. But keep in mind that before using their code, we still need to verify that the target has the required DLL loaded.
It is also possible to use execution primitives from other modules readily available in the target's address space (including the primary executable itself). Unfortunately, they find limited use in general-purpose solutions because of the specificity and complications related to ASLR.
Another pitfall that might complicate reusing code is the calling convention and the number of expected parameters. As we will see below, there are several ways to gain execution that differ in the number of parameters they support. Unfortunately, they don't come even close to covering all possibilities. Hence, some circumstances might require writing a custom shellcode into the target merely to call an existing function.
Writing executable code into another process isn't inherently problematic; you can find the overview of available options in the Managing Remote Memory section below. Preparing this code, however, is a substantially more complex task. Most of the time, it heavily depends on the concepts that are out of the scope of the paper. So, instead of talking about assembly and shellcode, we will focus on higher-level languages like C and, therefore, DLLs.
Probably the most widely targeted function during code injection is
LoadLibrary. This function is implemented in kernelbase.dll and has an alias forwarder in kernel32.dll - two baseline libraries for all Win32 (both GUI and Console) applications. As a result, it is available in all Windows subsystem processes. Also, conveniently (as we will see below), it requires a single pointer parameter - the path to the file. The reason for the majority of injection techniques to target this function is simple: DLL loading is a local operation, i.e., there is no API for loading libraries remotely. Hence, most of the time, the sole purpose of a specific technique is to force another process into calling this function with the attacker-controlled parameter. There is, of course, a notable exception that allows avoiding this call altogether; we are going to examine this approach at the end of our discussion.
Hence, we can group all DLL injection techniques based on the primary API call:
LdrLoadDllis a Native API function implemented in ntdll.dll that takes on the heavy lifting of module loading and powers
LdrLoadDllin place of
LoadLibrarycan be inconvenient due to its prototype but allows injecting DLLs into native-subsystem processes and might help avoid basic runtime detection.
LoadLibraryusing memory manipulation routines. Writing a manual mapper is considerably trickier than other types of injectors because of the many nuances it requires taking into account when deploying the image into the target's address space.
The simplest and most reliable way to start executing code in a different process is to create a remote thread. This operation requires
PROCESS_CREATE_THREAD access (part of
GENERIC_WRITE) to the target and is achieved via
NtCreateThreadEx. The last function provides more control over the flags, which might come in handy for avoiding detection. For example, it can instruct the system to skip attaching to DLLs or hide the new thread from the debugger.
The payload is expected to follow the stdcall convention, take a single pointer-size parameter, and return a 32-bit value. The result value becomes the exit status and can be inspected afterward. Additionally, it is easy to synchronize with the code running on a new thread because the object becomes signaled upon termination. Some variations for the prototypes are naturally possible when targeting specific processor architectures, such as having no parameters or returning a 64-bit value on x64 systems due to binary compatibility. Remarkably conveniently,
LoadLibrary closely follows this prototype, which allows us to invoke it directly without any stub code. As we will see below, the most basic DLL injection techniques rely on this exact approach.
An alternative solution is to run the payload on the existing threads. There are many ways to hijack execution and achieve our goals, but probably the most convenient is Asynchronous Procedure Calls (APCs). APCs are a facility provided out-of-the-box by the OS. Essentially, it includes a per-thread queue of callbacks to execute when the thread is not doing any other work. Queueing new tasks is possible between process boundaries and requires
THREAD_SET_CONTEXT access (which is, again, part of the
One notable difference APCs have with new threads is a more flexible prototype for the callback function. It might sound surprising if you already have experience with the Win32 API wrapper (
QueueUserApc) since it uses a prototype that is effectively equivalent to the one we've seen before. Nevertheless, the underlying syscall (
NtQueueApcThreadEx) works slightly differently. The callback it uses follows the stdcall convention, has no return value, and takes three pointer-size parameters. Yet again, functions with up to three parameters and a return value should work due to binary compatibility on x64. This gives APCs an unparalleled advantage because it supports a significantly broader range of payloads. As for the disadvantages, the code that posts tasks to the queue cannot wait for its completion and determine whether the invocation was successful or not.
Additionally, there remains a question of when the payload will execute. Historically, until the recent versions of the OS, Windows exposed a single type of user-mode APCs that requires an explicit acknowledgment from the program, indicating that it's safe to run the pending calls from the queue. This approach is advantageous for preventing deadlocks as it helps to ensure that APCs do not execute while the thread is holding locks on synchronization primitives.
There are two ways for a thread to execute normal APCs: either sleep or wait in an alertable state or call
NtTestAlert. Therefore, most of the existing implementations for APC-based injection take an indefinite amount of time before the payload executes. They work best against UI applications because those tend to wait for window messages in an alertable state.
Windows 10 RS5, however, added a new type of user-mode APCs called special APCs. Overall, they work similarly to the standard counterpart we just discussed, except they execute immediately (i.e., on the closest subsequent transition from the kernel to the user mode, before any application code). That makes special APCs incredibly useful for code injection. But keep in mind that the payload might get executed at an inconvenient time (such as when the thread is holding some locks), so, preferably, it should be as simple as possible. Technically,
LoadLibrary doesn't satisfy this criterion, but, generally, it works well. The proof-of-concept code provided with this repository makes use of special APCs whenever possible.
Here are a few alternative techniques that we will not cover in detail:
SetThreadContext(and their native counterparts) provide access to this information and allow arbitrarily changing it. Hijacking execution via modifying contexts requires the
THREAD_SET_CONTEXTaccess and a specially designed stub code (written in assembly) to be available in the target's address space. This stub prepares the stack, invokes the payload, and then restores the context to its original state.
Processes on Windows use private virtual address space. Consequently, the code that runs in their context can directly access and modify only its own memory. This mechanism provides isolation that is essential for the stability of the OS and its components but might become inconvenient during inter-process operations. The functions we want to invoke remotely might require some context information, like the filename string for
LoadLibrary, which should be accessible in the target's address space. Additionally, writing machine instructions requires keeping in mind various types of memory protection (i.e., the distinction between readable, writable, and executable pages).
To address the issues that arise from similar tasks (usually performed by debuggers), Windows provides various functions that allow manipulating an address space of a different process. These functions are, of course, subject to access checks but, provided sufficient permissions, can do effectively everything a process can do to itself.
|Function||Type of Memory||Required Access||Description|
|ReadProcessMemory / NtReadVirtualMemory||Any||
||Reads bytes from an address space of a process.|
|WriteProcessMemory / NtWriteVirtualMemory||Any||
||Writes bytes into an address space of a process.|
|VirtualProtectEx / NtProtectVirtualMemory||Any||
||Adjusts memory protection of an existing region.|
|VirtualQueryEx / NtQueryVirtualMemory||Any||
||Retrieves various information about the entire address space or its specific region. Note that the Win32 function exposes only a small portion of the available properties.|
|VirtualAllocEx / NtAllocateVirtualMemory||Private||
||Allocates or reserves private memory.|
|VirtualFreeEx / NtFreeVirtualMemory||Private||
||Releases previously allocated private memory.|
|MapViewOfFile2 / NtMapViewOfSection||Shared||
||Maps a shared memory region (a section object in Native API terms) into an address space of a process. The memory can be backed by a paging file, a regular file, or an executable image.|
|UnmapViewOfFile2 / NtUnmapViewOfSection||Shared||
||Unmaps a shared memory region.|
Windows supports two types of memory that use a slightly different set of functions for managing but are otherwise interchangeable: private and shared (also called mapped). Mapped memory can work in several modes, including projecting files from the disk for reading/writing and images for executions. We will cover the difference when talking about manually mapping DLLs later. Thus, when allocating general-purpose memory remotely, we effectively have two similar options that require slightly different access to the target. The example projects in the repository demonstrate both approaches, allowing us to test how they hold against detection. Occasionally, it might also be possible to reuse and modify existing memory regions (such as PEB) for a more concealed allocation. Although, this approach is less generic because it depends on the existing memory layout that we need to inspect first. The most notable examples revolve around hiding new code in RWX regions created by Just-In-Time compilation.
You won't find many injection techniques that rely on shared memory because mapping it remotely was not exposed in Win32 API until Windows 10 RS2; although, it existed in Native API from Windows 2000.
Additionally, you can find various code injection techniques that use other means of delivering data into the target process, not covered here:
SetWindowLongPtrcan effectively write custom data into it. Provided we know the offset where our data landed (that we can obtain by scanning the region locally) and the address at which the buffer is mapped in the target process, we can compute a pointer that is valid remotely. Techniques like PowerLoader use this approach combined with ROP, but it can also serve as a means for regular data delivery.
RegisterClipboardFormat. AtomBombing technique uses this structure as a medium for delivering the payload; combined with APCs and ROP, it provides access to code injection.
So far, we distinguished injectors based on how they gain execution and deliver the code or data. There is also a set of techniques that do not fall into these categories because they, effectively, don't use most of the facilities we've seen so far. The mechanisms we discuss in this section were introduced by Microsoft and implemented accordingly, i.e., by extending the functionality of system libraries. When specific conditions are met, they will automatically load and invoke user-controlled DLLs on the target's behalf. There are similarities of this behavior with backdoors, hence, the name.
The most widely used technique in this category is
SetWindowsHookEx. This API adds support for installing a wide range of window message processing-relates hooks on both global and a per-thread scale. While some of them allow receiving notifications about specific events (such as key presses) on the calling thread, a few must execute within the context of the target process and, thus, require a DLL to perform the actual work. Whenever a window in the scope of the hook receives a specific window message, the system automatically forces its owning thread to load our DLL. Despite the simplicity, there are two moderately significant limitations to this technique. Firstly, the function works only against applications that include GUI threads, preferably actively pumping messages. Secondly and more importantly, it is not suitable for typical DLLs. To work, the library either needs to contain a specifically designed callback function or perform all tasks in DllMain and then always fail the loading. Because of these limitations and the detailed documentation provided by Microsoft, we are not providing a demo project for this technique in the repository.
If injecting a DLL on process startup is also a viable option, here are a few facilities that can help:
The final approach we will discuss here is truly marvelous because it goes an extra mile to extend the functionality of DLL loading and significantly improves its resilience to detection. Welcome manual mappers - techniques that re-implement
LoadLibrary from scratch.
There are two primary reasons to bother with this approach. Firstly,
LoadLibrary is a common denominator for all other DLL injectors. Consequently, it is heavily monitored by security software. The underlying
LdrLoadDll is in a better position from the attacker's perspective, but just slightly. Secondly, these functions expect a filename as their input, limiting their use only to filesystem content. As we will see below, manual mapping doesn't have this limitation and supports loading DLLs from memory.
Module loader is implemented in user mode in ntdll.dll - the Native subsystem library that is always loaded into all processes. Here is the list of the most essential steps it takes to load a DLL:
NtOpenFilefor the Read & Execute access.
SEC_IMAGEattribute from the file. This step makes the system read the file and validate its structure according to the PE specification. In some cases, the kernel might also apply image relocations (discussed below) at this step.
NtMapViewOfSection. The new memory region will provide a copy-on-write view on the DLL but with a slightly different (so-called image) layout. Each PE section will get the specified memory protection (R/RW/RX/etc.) set automatically.
There are, of course, a lot of other implementation details that we won't cover, but these are less relevant to our goals. They include, for example, support for KnownDlls and the structure of the lists where the loader maintains the database of modules.
Most importantly, because the module loader is implemented in user mode and, therefore, follows the same rules of interaction with the kernel, it is possible to closely reproduce its behavior. Ultimately, we can even include a few improvements. As long as we deploy the image following the same layout and memory protection and then resolve its imports, relatively simple code should work correctly. Of course, the more complex the DLL gets, the more precisely we need to reproduce the loader's behavior.
There are three approaches we can take for reproducing the memory layout of a loaded DLL:
MEM_COMMITmemory and then fill it in remotely via cross-process writing. Finally, set the necessary memory protection afterward.
SEC_COMMITmemory. The primary benefit of this approach is that we can map a local view of this region first, deploy the image there (write it, apply relocations, resolve imports, etc.) and only then map a ready-to-run remote view.
SEC_IMAGEview of the image. This approach is what
LoadLibrarydoes under the hood. It automatically fixes PE section layout and protection and often even relocations. Unfortunately, it requires a file on the disk. Additionally, it notifies kernel-mode drivers about image loading, making it less suitable for stealthy injectors.
Probably the most widely known implementation of manual mapping is called reflective loading. This technique either relies on injecting a shellcode that allocates and deploys the image from an in-memory buffer and then invokes its entry point or requires a specially crafted DLL. All implementations of reflective loading we've seen are based on the first option from our list (private
MEM_COMMIT memory) and perform most tasks inside the target process. These design choices simplify some tasks such as resolving imports but might lead to drawing suspicion of the security software and lead to detection.
Since the primary topic of this discussion is the inter-process injection, the demo project for manual mapping included in this repository performs as many steps as possible remotely. It also relies on the second option (shared
SEC_COMMIT memory) because of its benefits for image deployment. Comparing the implementation to reflective loading, we can highlight that there are merely a few steps that touch the target process:
The categorization we introduced should cover the reasonable majority of DLL (and, partially, shellcode) injection techniques and their variations, probably even the future ones. Of course, the landscape constantly changes as it has been for years, thanks to the work of many dedicated security researchers. Detection methods also competitively evolve over time (as you can read in the corresponding writeup), so keeping the arsenal of tricks in your sleeve up-to-date is essential. Still, we believe that the material provided here can provide valuable insight on the topic, both for beginners and professionals.
The techniques for concealing code execution that we discussed in the offensive (, ) parts of the project have a shared primary objective: they are designed to fool security software and bypass security policies. Identifying something that actively hides its presence can prove challenging because of the inherent limitations it imposes on the detection methods. Luckily for defenders, most evasion techniques share common flaws that allow hunting for their entire families. Just keep in mind that the traces they leave on the system are often overlooked by antiviruses and might require using custom tooling.
Here is a high-level overview of the topics covered in this write-up:
Signature-based detection provides a decent starting point because it can recognize known malware and its variations by looking for shared patterns. The primary advantage of using this approach is that the analysis can be performed before execution and, thus, is suitable for providing the first echelon of defense. Of course, signature-based detection works best against well-known samples. Attempting to classify uncommon, re-compiled, or obfuscated programs, on the other hand, often yields inconsistent results that heavily depend on various parameters. Because of that, false negatives might still allow malicious programs to stay under the radar, while false positives become a source of constant frustration among writers of system tools. That being said, now we can focus on discussing the most fundamental property of the tools for purposefully concealing code execution: the API dependencies.
Hiding malicious code inside a process that belongs to (or appears to belong to) a trusted component inevitably requires the help of the underlying operating system. The OS effectively owns all resources that programs use for anything more than purely algorithmic computations, so malicious (as well as any other) code needs to use the services it provides to work with such resources.
The scope of this research includes two classes of techniques, both of which primarily operate on two types of securable objects: processes and threads. To start with, let's focus on the first class of techniques - shellcode and DLL injection. As described in more detail in the corresponding write-up, code injection almost always includes two phases: delivery and execution. Additionally, it might also include some reconnaissance logic to identify a suitable target for injection. Here is an overview of the API calls that are typically used in each stage:
|EnumProcesses / CreateToolhelp32Snapshot / NtQuerySystemInformation||System||Enumerating processes on the system to find a suitable target for injection.|
|OpenProcess / NtOpenProcess||Processes||Gaining access to the target process to perform subsequent manipulations.|
|VirtualAllocEx / VirtualAlloc2 / NtAllocateVirtualMemory||Memory||Allocating memory in the specified process.|
|WriteProcessMemory / NtWriteVirtualMemory||Memory||Writing shellcode or DLL filename into an address space of another process.|
|VirtualProtectEx / NtProtectVirtualMemory||Memory||Making a memory region in another process writable or executable.|
|OpenThread / NtOpenThread||Threads||Accessing a thread in another process to gain execution.|
|CreateRemoteThread / CreateRemoteThreadEx / RtlCreateUserThread / NtCreateThreadEx||Threads||Invoking code in another process.|
|QueueUserAPC / QueueUserAPC2 / NtQueueApcThread / NtQueueApcThreadEx / RtlQueueApcWow64Thread||Threads||Gaining execution on an existing thread in a different process via Asynchronous Procedure Calls|
|SetThreadContext / NtSetContextThread||Threads||Directly hijacking execution of a thread by manipulating its registers.|
The second class of techniques we cover in this repository focuses on concealing code on a scale of an entire process. Because it's substantially easier to tamper with processes that haven't started execution, such techniques almost always launch them in a suspended state. Additionally, most stealthy Process Tampering techniques rely on a highly specific and extremely low-level (thus, irreplaceable) functions. The reason for that is currently irrelevant (it is explained in great detail in the corresponding write-up); what is essential is that these functions can serve as a reliable indicator for detection purposes. As for the rest, the list of commonly used API calls overlaps with the one shown earlier. Although, this one has a more noticeable focus on memory operations due to the nature of low-level process creation.
|NtCreateProcess / NtCreateProcessEx||Processes||Creating a new process from a memory mapping instead of a file.|
|CreateRemoteThread / CreateRemoteThreadEx / RtlCreateUserThread / NtCreateThreadEx||Threads||Creating the initial thread in the target process.|
|SetThreadContext / NtSetContextThread||Threads||Adjusting the start address of the initial thread.|
|ResumeThread / NtResumeThread / NtResumeProcess||Threads||Resuming target's execution after tampering.|
|CreateFileMapping / CreateFileMapping2 / NtCreateSection / NtCreateSectionEx||Memory||Creating a memory mapping object to project an executable file into the address space of the target process.|
|UnmapViewOfFile2 / NtUnmapViewOfSection / NtUnmapViewOfSectionEx||Memory||Unmapping the original executable from the address space of the target process.|
|ReadProcessMemory / NtReadVirtualMemory||Memory||Reading the image base address from a remote Process Environment Block (PEB).|
|VirtualAllocEx / VirtualAlloc2 / NtAllocateVirtualMemory||Memory||Allocating remote memory for the process parameters block.|
|WriteProcessMemory / NtWriteVirtualMemory||Memory||Filling in the process parameters and adjusting pointers in PEB.|
|CreateTransaction / NtCreateTransaction||Other||Creating a filesystem transaction for temporarily replacing the file (see Process Doppelganging).|
To help automate the static detection of suspicious executables, we include a set of YARA rules that look for the use of API functions described above. Note that these rules target an entire family of techniques rather than specific samples. Therefore, they are meant to be used as an indicator of suspiciousness rather than maliciousness.
Runtime detection allows automatically analyzing the behavior of software as it runs and provides a convenient layer of defense against programs that abuse misconfigurations in security policies and weaknesses of security mechanisms. Runtime detection logic can operate in two distinct modes:
Synchronous mode. Security products that synchronously collect data about a program's attempts to perform sensitive operations have a notable advantage: they can work as a host-level intrusion prevention system. By definition, the operating system forces processes (malicious or not) to wait until the anti-malware services verify and acknowledge such operations. Because of that, the developers of detection engines don't need to worry about being too late to take action. They can apply behavioral analysis and heuristic rules and even make reputation-based decisions on a per-operation basis. As a downside, though, synchronous detection can introduce noticeable performance degradation when used excessively, forcing the vendors to minimize the scope and the amount of detection logic.
Asynchronous mode. These lightweight detection mechanisms subscribe to various notifications or perform constant pulling (for example, by reading the recent logs), observing events after they happen. Because these are non-blocking operations, the security software might get notified with a noticeable delay, which yields asynchronous detection useless for active prevention. On the upside, the operating system generates plenty of rich telemetry data that can be valuable for performing detailed analysis and can be consumed without the complications typical for the synchronous mode. Additionally, a portion of the changes observable in the asynchronous mode is also available for post-factum forensic analysis.
Before we start describing the specifics, let's take a closer look at a typical example of DLL injection and the operations that we can expose it:
As you can see, code injection relies on performing rather noisy actions and provides multiple triggers suitable for detecting and blocking it. However, reliable synchronous detection mechanisms require support from the underlying operating system, so not all these operations are equally easily observable.
Windows provides several callbacks available from the kernel that are especially useful for revealing concealed code execution.
When combined, these core mechanisms can form a formidable line of defense. That's why most modern security products already subscribe (or at the very least should subscribe) to these events and use them to block known malware samples. If you're not planning to write your kernel driver for synchronous detection, perhaps, the most convenient option is to use Sysmon. It exposes easily-consumable events from the first four callbacks and logs a substantial amount of data that can help to correlate them. It also covers some file-system operations (which we haven't discussed) that might come in handy for detecting malware that drops files on the disk.
If you compare the list of operations described in the diagram, you'll notice that kernel callbacks don't cover the entire spectrum of what we might want to detect in the synchronous mode. For example, Windows doesn't include built-in facilities for intercepting remote memory allocation and modification. In such cases, the common choice of security software is to rely on a fallback interception mechanism - user-mode hooks. Typically, an AV/EDR system uses its privileged position to globally inject a library that patches the first instructions of the functions of interest and intercepts their execution. The library then forwards the passed parameters to the detection logic, which logs the operation and determines whether it should be allowed. It's essential to understand the limitations of this approach as it does not provide security guarantees because it relies on the program's willingness to tolerate hooks. Software is generally allowed to do whatever it wants with its memory and the instructions in this memory, including patching it back to the original state.
Finally, if we merely want to monitor the system for suspicious activity in an asynchronous mode, Windows provides support for security auditing. The current version includes as many as 59 different categories of events, although not as many are helpful for our purposes. It can detect privilege use, handle manipulation, and in some cases, even help detect advanced techniques like Process Doppelganging. Another example is if you carefully configure SACLs (System Access Control Lists) on a per-directory basis to audit
FILE_EXECUTE access, Windows Auditing can generate events on DLL loading from user-writable locations. You can also achieve similar results by deploying AppLocker rules and Windows Defender Application Control code integrity policies.
Whenever the previous stages of detection fail, it becomes essential to understand how and when the exploitation happened so we can improve our defenses. That's where forensic investigations come into play. Code injection is hardly an operation that leaves no traces; it's just a matter of knowing where to look. There are two main approaches for performing such detection:
The diagram from the runtime detection section mentions several visible traces that DLL injection leaves in the target's process memory. These traces include the executable memory region coming from the DLL itself, plus some other internal data structures used by the module loader to identify this region. Whenever a process loads a file using
LoadLibrary (or the underlying
LdrLoadDll), the system adds an entry into the loader database. Later we can parse these structures and extract various valuable information, including:
Finally, knowing the address, we can query the memory manager and retrieve the current path to the file. The system keeps track of rename operations on mapped images so the output can be more accurate than the one stored in the loader database. If the file was additionally tampered with (such as being deleted or overwritten in a transaction), the query fails, but the error code still provides valuable insight on the reason.
There are several GUI tools that allow inspecting this information, most notably System Informer (formerly known as Process Hacker). Here you can see a screenshot of how two views (memory vs. loader data) accompany each other:
In case you are limited to command-line tools only or need to operate on memory dumps, we include two simple programs with similar functionality. LoadedModules provides a flat view of the loader database, while MappedImages iterates over memory and retrieves current filenames for images. Both tools can operate on live systems or capture minidumps and then inspect them in offline mode.
Of course, recognizing more sophisticated attacks that map executables manually or patch memory that belongs to legitimately loaded modules requires more insight than these small tools included with the repository can provide. The primary focus during hunting for such attacks shifts towards searching for anomalies throughout the entire view of memory. Here is a list of commonly established indicators of compromise that should draw attention and subsequent investigation:
Combining all memory forensics tricks described above provides a formidable solution for detecting side-loaded code. These techniques, however, rely on an unspoken assumption that the payload is still present at the time of the investigation. Yet, we might be interested in unwinding the clock and exploring previously loaded code. It is, of course, impossible to reconstruct memory content (and even just its layout) at a specific point in the past in the general case without prior preparations. Nevertheless, it is possible to extract some traces of DLLs long after unloading.
More specifically, the image loader from ntdll keeps track of the last 64 DLL unload events, recording filenames, address ranges, timestamps, and a few other fields. This trace is stored in the target's process memory and is accessible via
The UnloadedModules tool from this repository can parse and display such traces from live processes and minidumps, plus capture minidumps for later analysis.
The discussion so far focused on code injection, and while most of the suggestions still apply when detecting process tampering, there are also a few differences worth mentioning. First, as opposed to the DLL loading procedure, the user-mode image loader doesn't load the main executable - it only resolves its imports and records information about what's already there. Therefore, it fully trusts the image name string that the creator provides in the parameters block. Secondly, the system supports creating processes in a suspended state, which introduces an artificial delay and constitutes the primary period for performing tampering. Finally, Windows records more information about the process and the main module than about subsequently loaded DLLs. Identifying contradictions between the fields that come from different sources but should otherwise contain identical information proves crucial for successful detection.
Here is an overview of the fields and properties of interest:
ProcessImageFileNameWin32. The system dynamically retrieves this string from the file used to create the process. Thus, it tracks rename operations. The query results in an error if the file doesn't have a valid Win32 name or has been deleted.
ProcessImageFileName. The system populates this string by looking up the name from the file object during process creation and doesn't track subsequent renames. The field holds a null pointer if the system fails to retrieve the value.
RTL_USER_PROCESS_PARAMETERSstructure stored in user-mode memory. The process's creator explicitly provides this string on the Native API level and has complete control over its value.
LDR_DATA_TABLE_ENTRYfor the main module from Process Environment Block. Once the process starts, the code from ntdll copies the string from
ImagePathName(see the previous item) into the loader database. Because this string also resides in user-mode memory, it can be easily modified even after process initialization.
RTL_USER_PROCESS_PARAMETERS, other components can use
ProcessCommandLineInformation, which reads the string from user-mode memory, and, thus, reflects any changes made there.
MemoryMappedFilenameInformation, which looks up the current path to the file and tracks renames. If the provided address belongs to private memory or the filename cannot be determined, the query fails.
Aside from the strings, there are also two important pointer values:
PEB(Process Environment Block) and gets chosen automatically during process creation. Because PEB resides in user-mode memory, this field is easy to overwrite to point to a new location. Ntdll uses this value internally during process initialization and copies it into the loader database.
NtCreateUserProcess-based techniques, the system sets the value to the entry point of the main executable per the chosen image base. Some tampering techniques need to modify the start address to redirect execution. Creating a process with
NtCreateProcessExrequires the caller to provide the values when manually creating the initial thread. Note that ntdll infers the value for the entry point field to store in the loader database independently based on the image base and the PE headers.
As mentioned earlier, revealing a mismatch between various properties is crucial for identifying process tampering. Here, the details matter because even error codes provide valuable insight into the nature of the subject.
On the screenshot above, you can see how the ProcessIdentity tool allows us to distinguish several most common process tampering techniques solely based on comparing a few properties that would otherwise be identical.
While modern techniques for concealing code execution might fool security software that relies on more naïve classical approaches for scanning, it's not by any means impossible to catch given appropriate attention. We hope that this paper shares the necessary knowledge and tools to make the world more secure by combating the favorite techniques of Advanced Persistent Threat actors.
Here you can find more quality material from other authors on the topic: