Introduction to Windows Architecture

Table of Contents

Windows Malware Development - This article is part of a series.

Part 1: This Article

PE Files
#

For the sake of simplicity, I will show only the 64 bit version of structures (structs) even though some have 32 bit counterparts. These structures can be found in the winnt.h header file.

A Portable Executable (PE) file is the standard format created by Microsoft for binary files like EXEs and DLLs. The structure of a PE file below¹:

When a PE file is run, it will be copied into memory along with any dependences it requires such as imported binaries for the functions they export. Because of advances in security, the memory region they are copied into is randomized every time. This process is called Address Space Layout Randomization (ASLR). In other words the address for functions, headers, or any other part of a PE file will not be the same for each instance. Instead, this is solved with Relative Virtual Addresses (RVAs). RVAs are offsets in memory relative to the base address of the PE file. To calculate the address in which something sits, take the base address of the PE file and add the RVA associated with the desired location.

Headers
#

Headers are the first parts of the PE file. These contain information about the file and what is needed in order to be executed or used. I won’t go into detail for all headers and sections for the sake of brevity but the important items to note are that the DOS header includes: the magic bytes, values about the contents of the file and subsequent sections, and the RVA of the address within the file on disk for the PE header (NT header). The DOS header will always signify the start of a PE file.

The DOS header struct is noted below. The e_lfanew member within the struct is the aforementioned RVA to the PE header.

typedef struct _IMAGE_DOS_HEADER {      // DOS .EXE header
    WORD   e_magic;                     // Magic number
    WORD   e_cblp;                      // Bytes on last page of file
    WORD   e_cp;                        // Pages in file
    WORD   e_crlc;                      // Relocations
    WORD   e_cparhdr;                   // Size of header in paragraphs
    WORD   e_minalloc;                  // Minimum extra paragraphs needed
    WORD   e_maxalloc;                  // Maximum extra paragraphs needed
    WORD   e_ss;                        // Initial (relative) SS value
    WORD   e_sp;                        // Initial SP value
    WORD   e_csum;                      // Checksum
    WORD   e_ip;                        // Initial IP value
    WORD   e_cs;                        // Initial (relative) CS value
    WORD   e_lfarlc;                    // File address of relocation table
    WORD   e_ovno;                      // Overlay number
    WORD   e_res[4];                    // Reserved words
    WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;                   // OEM information; e_oemid specific
    WORD   e_res2[10];                  // Reserved words
    LONG   e_lfanew;                    // File address of new exe header
  } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

Immediately after the DOS header will be the DOS stub which is there essentially for compatibility reasons. If an executable file were to be loaded in a DOS system, the OS would be able to run the stub because it’s a 16 bit program. If you have ever run Strings on any Windows binary, you’ve likely encountered the famous text: “This program cannot be run in DOS mode.”

The PE header holds more information about the contents of the file and subsequent sections. More importantly it contains the structure for the Optional header! The Optional header contains A LOT of information needed later when creating the malware. It may be called the Optional header, but it is required. Some important ones include pointers to the beginning of some sections, the preferred base address, and an array of Data Directory structs.

The PE header and the Optional header structs are noted below respectively:

typedef struct _IMAGE_NT_HEADERS64 {
    DWORD Signature;
    IMAGE_FILE_HEADER FileHeader;
    IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;

typedef struct _IMAGE_OPTIONAL_HEADER64 {
    WORD        Magic;
    BYTE        MajorLinkerVersion;
    BYTE        MinorLinkerVersion;
    DWORD       SizeOfCode;
    DWORD       SizeOfInitializedData;
    DWORD       SizeOfUninitializedData;
    DWORD       AddressOfEntryPoint;
    DWORD       BaseOfCode;
    ULONGLONG   ImageBase;
    DWORD       SectionAlignment;
    DWORD       FileAlignment;
    WORD        MajorOperatingSystemVersion;
    WORD        MinorOperatingSystemVersion;
    WORD        MajorImageVersion;
    WORD        MinorImageVersion;
    WORD        MajorSubsystemVersion;
    WORD        MinorSubsystemVersion;
    DWORD       Win32VersionValue;
    DWORD       SizeOfImage;
    DWORD       SizeOfHeaders;
    DWORD       CheckSum;
    WORD        Subsystem;
    WORD        DllCharacteristics;
    ULONGLONG   SizeOfStackReserve;
    ULONGLONG   SizeOfStackCommit;
    ULONGLONG   SizeOfHeapReserve;
    ULONGLONG   SizeOfHeapCommit;
    DWORD       LoaderFlags;
    DWORD       NumberOfRvaAndSizes;
    IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;

Directories
#

A Data Directory structure is essentially used to hold information about important directories (and by extension, tables). Specifically the structure contains the virtual address in which the Directory Entry is located as well as the size of the entry. The array of Data Directory structs is specifically ordered so that each index contains a specific entry. For example the first index is the Data Directory structure for the Export Directory, whereas the index after that is the structure for the Import Directory. These are very important to note as they contain both the Export Address Table (EAT) and the Import Address Table (IAT) respectively.

Just as it sounds, the Export Directory is the place where all exported functions can be found within the PE file. Take ntdll.dll for example: Let’s say we wanted to call the NtCreateFile function. To find where that function is located in memory and call it that way, we would have to search the Export Directory of ntdll.dll to get the address of the function. However, that may not be as simple as it sounds. Here is why.

The last 3 members of the struct are a set of arrays that play a critical role in how the locations of functions are resolved. There are two ways that resolution can happen: by searching first for the name of the function, and second, by the biased ordinal number.

Export Directory
#

The Export Directory struct is shown below. Note the AddressOfFunctions, AddressOfNames, and AddressOfNameOrdinals members which are the arrays I mentioned above.

typedef struct _IMAGE_EXPORT_DIRECTORY {
    DWORD   Characteristics;
    DWORD   TimeDateStamp;
    WORD    MajorVersion;
    WORD    MinorVersion;
    DWORD   Name;
    DWORD   Base;
    DWORD   NumberOfFunctions;
    DWORD   NumberOfNames;
    DWORD   AddressOfFunctions;     // RVA from base of image
    DWORD   AddressOfNames;         // RVA from base of image
    DWORD   AddressOfNameOrdinals;  // RVA from base of image
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

To start, let’s take a step back and see what the arrays are for and the relationship between them. Shown below is a good visualization of how these arrays work together²:

Simply stated the AddressOfNames is the array that contains the names of the exported functions of the binary. The index of the function name in that array is directly correlated to the index of the AddressOfNameOrdinals array. That will give you the ordinal number (index) for where to find said function’s address via an RVA in the AddressOfFunctions array.

Returning to the diagram above, let’s say we wanted to use ’name2’ from the imaginary binary’s exported functions. The first step would be to go into the array AddressOfNames starting with the first index and compare the names of the functions until we find a match. In this case, it is our third index. The next step would be to go into the AddressOfNameOrdinals array at the third index to find the ordinal. In this example it would be 2. (It isn’t always this simple; sometimes it could be ordinal 102.) That ordinal is the index for where to search in the AddressOfFunctions to get the RVA. In this case, we would get the RVA addr2 because the ordinal was 2.

Note: When I say “ordinal” in the above example, I am talking about the unbiased ordinal.

However to save memory, and especially if the binary has hundreds or thousands of exported functions, it may be wise to use ordinal numbers instead of function names. These are called “biased ordinals.” The distinction depends on how the ordinal is calculated. Unbiased ordinals are used internally to calculate the index of where the RVA is located within the AddressOfFunctions. Biased ordinals are calculated by adding the Base member from the Export Directory to the ordinal number. The result is used externally to call exported functions.

The Base member is the starting number in which the biased ordinals start (typically the Base member is set to 1)

Going back to the example, let’s say we knew the biased ordinal of the function we wanted to call. Assume the Base is 1 and the biased ordinal is 3. What we would do is subtract the Base from the ordinal (3-1) which equals 2. That means the index that we need to go to in the AddressOfFunctions array is 2. Looking at the diagram, this would be an RVA of addr2.

Import Directory
#

Moving onto the Import Directory. When the PE files need to use exported functions from other binaries, this is how the loader will determine which functions from which binaries are needed and where to find them in memory. Unlike with the Export Directory, the Import Directory doesn’t have a dedicated “import directory” struct. Instead it starts with an array of Import Descriptor structs and ends with a NULLed out struct.

The Import Descriptor struct is shown below. This is often referred to as the Import Directory Table (IDT). For every imported binary, there will be a corresponding Import Descriptor struct.

typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;            // 0 for terminating null import descriptor
        DWORD   OriginalFirstThunk;         // RVA to original unbound IAT (PIMAGE_THUNK_DATA)
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;                  // 0 if not bound,
                                            // -1 if bound, and real date\time stamp
                                            //     in IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT (new BIND)
                                            // O.W. date/time stamp of DLL bound to (Old BIND)

    DWORD   ForwarderChain;                 // -1 if no forwarders
    DWORD   Name;
    DWORD   FirstThunk;                     // RVA to IAT (if bound this IAT has actual addresses)
} IMAGE_IMPORT_DESCRIPTOR;
typedef IMAGE_IMPORT_DESCRIPTOR UNALIGNED *PIMAGE_IMPORT_DESCRIPTOR;

Before diving into the specific attributes of the structures, it’s important to know how the import resolution works and how it functions. Essentially there are three main parts: the Import Directory Table (IDT), the Import Lookup Table (ILT), and the Import Address Table (IAT). When the PE file gets copied to memory in order to be executed, the loader will eventually need to figure out what binaries are required to be coped into memory as well as the functions’ addresses needed for the PE file to function correctly. The loader will then begin with the IDT to figure out what imported binaries are needed. However, there is no need to catalog every function of the imported binaries and by extension, its addresses. To save time, memory, and CPU cycles, the loader will attempt to resolve only the needed functions. This is where the ILT comes into play. The ILT contains an array of RVAs that point to the Hint/Name table. The Hint/Name table contains Hint/Name pairs in which for every function imported from the given binary, there will be a Hint/Name entry for it.

Take for example a PE file that imports the NtCreateFile function from ntdll.dll. There will be an IDT entry for ntdll.dll and within that, there will be an ILT entry for NtCreateFile which will have a Hint/Name pair for that function.

The Hint/Name pair is fairly straight forward. As mentioned, there is optimization built into how PE files work. Instead of recursively going through every single exported function of the imported binary (process explained above in the Export Directory section), the ILT gives a “hint” of the most likely location of the required function. The loader will then use the hint as a “best guess” and associate the hint as an index to the AddressOfNames array. The given function name in the Hint/Name pair is then compared to the string within that index. If there is a match, it proceeds with resolving the exported functions address. If no match it attempts to resolve the functions address by name instead. Once it has resolved the address in which the function lies, it will add that as an entry within the IAT.

The IAT is the last (and arguably the most important) table. It starts out initialized so that each entry points to its corresponding Hint/Name table entry. Then, for each function that it resolves, it overrides the IAT entry with that functions RVA. In other words, the IAT and ILT will be exactly the same on disk and right before import resolution happens in memory. After this process concludes, the IAT will then contain the RVAs of the functions. This is helpful when the PE file needs to call that function, it will directly parse the IAT for where to find the function in memory.

Moving on to the layout of the structs, let’s look at how they are laid out. Starting with the union in the IDT, both of the members are effectively doing the same thing. The Characteristic is mostly deprecated and the OriginalFirstThunk member is an RVA to the ILT. The Name member will contain the name of the imported binary, and the FirstThunk member is the RVA to the start of the IAT.

Both the OriginalFirstThunk and FirstThunk members are of the THUNK_DATA type. The structure for these is given below:

typedef struct _IMAGE_THUNK_DATA64 {
    union {
        ULONGLONG ForwarderString;  // PBYTE 
        ULONGLONG Function;         // PDWORD
        ULONGLONG Ordinal;
        ULONGLONG AddressOfData;    // PIMAGE_IMPORT_BY_NAME
    } u1;
} IMAGE_THUNK_DATA64;
typedef IMAGE_THUNK_DATA64 * PIMAGE_THUNK_DATA64;

These structs contain the the data required for either the ILT or the IAT. The former will contain a pointer to the Hint/Name entry; the latter indicates the RVA to the IAT entry.

The Hint/Name table entries are built with the IMPORT_BY_NAME struct shown below:

typedef struct _IMAGE_IMPORT_BY_NAME {
    WORD    Hint;
    CHAR   Name[1];
} IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;

As you can see, the structure has two members, one for the hint and the other for the name of the function.

Sections
#

There are typically 7 sections following the headers and data directories:

Section	Contains
.text	Executable code
.rdata	Read-only data
.data	Initialized data
.pdata	Exception handling data
.idata	Import data (the import directory is actually held here)
.rsrc	Resources (icons, pictures, etc.)
.reloc	Relocations (how statically programmed addresses map to ASLR)

The compiler can add more sections if needed but these are the common ones-found inside most PE files. To get the gears turning, each section can be somewhat manipulated to intentionally obfuscate some malicious capabilities by the location of certain information.

For example, if we had a uint8 variable called varOne, that variable would be stored in the .text section. However, if we made that a constant variable, it would instead be stored in the .rdata section. Likewise, if there is a byte array that is needed, why not convert it to a .jpg file and that would be stored into the .rsrc section? These are examples of a few ways to facilitate the acquisition and intentional manipulation of data within the PE file sections.

Each section has it’s own struct and within each of the them are the members that give information about them. Those to note are the VirtualAddress, NumberOfRelocations, and PointerToRelocations. The SECTION_HEADER struct as shown below:

typedef struct _IMAGE_SECTION_HEADER {
    BYTE    Name[IMAGE_SIZEOF_SHORT_NAME];
    union {
            DWORD   PhysicalAddress;
            DWORD   VirtualSize;
    } Misc;
    DWORD   VirtualAddress;
    DWORD   SizeOfRawData;
    DWORD   PointerToRawData;
    DWORD   PointerToRelocations;
    DWORD   PointerToLinenumbers;
    WORD    NumberOfRelocations;
    WORD    NumberOfLinenumbers;
    DWORD   Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

An additional section that may be useful to understand is .reloc. For a few reasons, it is helpful to know how relocations work and how to calculate them. One is that reflective DLLs will need to do relocations manually in order to work.

Any statically programmed addresses will need to be changed due to ASLR. This is done with relocations. Relocations are calculated by taking the delta between the ImageBase address (the preferred base address) inside the Optional header and the actual base address of the PE file (allocated by ASLR). After determining the delta, add it to the addresses within the relocation table.

The relocation table is an array of BASE_RELOCATION structs. These have two members: VirtualAddress and SizeOfBlock. The former is the RVA of the address to which the relocation needs to be applied. The latter is the size or amount of bytes needed to be relocated. Here is the aforementioned relocation struct:

typedef struct _IMAGE_BASE_RELOCATION {
    DWORD   VirtualAddress;
    DWORD   SizeOfBlock;
//  WORD    TypeOffset[1];
} IMAGE_BASE_RELOCATION;
typedef IMAGE_BASE_RELOCATION UNALIGNED * PIMAGE_BASE_RELOCATION;

Windows API
#

After the PE file is loaded into memory and goes through the processes described above, how does this interact with Windows and its subsystems? This is where the Windows API (WinAPI) comes into play. The WinAPI is used to call certain functions that are exposed in Windows to interact with memory, processes, subsystems, and others. There are two main levels within execution: user space and kernel space. The former is where most applications are run along with segmentation that restricts lower level operations due to security reasons. The latter is where Windows carries out those operations and executes the syscalls. Syscalls are very low level CPU/hardware instructions that tell the computer what to do and how to do it.

I created the visual below to show the execution flow. Notice the two aforementioned levels. However, the user space will contain most of the operations which include the WinAPI calls.

stateDiagram-v2 direction LR state UserSpace { direction LR Application --> WinAPI state WinAPI { direction LR WinAPIFunction --> NTAPI state NTAPI { Direction LR NTAPIFunction --> KernelSpace state KernelSpace { Syscall } } } }

When an application wants to do something, it will call its dedicated WinAPI function. Effectively the WinAPI function call is a wrapper for the NTAPI version of that call which is held in ntdll.dll. This is because the NTAPI is the gateway to the kernel space and by extension, its syscall which requires careful handling to make sure it is being called correctly. According to Microsoft, most of the NTAPI function calls are undocumented and if used, can be extremely unstable due to potential changes when there is an update. So the WinAPI is an effective way to keep the function calls stable with little change when there are updates to Windows. In other words, the application will call a WinAPI function. That function will call its NTAPI version which will transition it into kernel space and call its syscall.

Shown below is the same graph but replaced with a real world example to demonstrate the execution flow. Note that the syscall is 0x0055 which is the corresponding syscall for the function during the writing of this post. However, syscall numbers can change (and frequently do).

stateDiagram-v2 direction LR state UserSpace { direction LR Notepad.exe --> WindowsAPI state WindowsAPI { direction LR CreateFileA --> NTAPI state NTAPI { Direction LR NtCreateFile --> KernelSpace state KernelSpace { 0x0055 } } } }

Here Notepad.exe is trying to create a file. To do so it will use the CreateFileA WinAPI function. That function will call its NTAPI version which is NtCreateFile. Afterwards it will transition into kernel space and execute its syscall number in this case being 0x0055.

You can manually check the syscall numbers via debugging the syscall instruction during the execution of functions; OR look them up from historical databases like the one found here or here. For proof of concept, below is what the same syscall looks like in x64 assembly under the debugger:

Notice the mov instruction above the syscall. Specifically the bytes 55 is the syscall number for NtCreateFile. Some databases refer to the syscall number in decimal form; others refer to them in byte form similar to how it’s executed in assembly as seen above. Either way, 55 in bytes translates to 85 in decimal so keep an eye out for how they are represented.

Massive thanks to Ange Albertini for creating the PE diagrams found on his github! ↩︎
Very helpful diagram via infosecinstitute from Dejan Lukan ↩︎

Windows Malware Development - This article is part of a series.

Part 1: This Article

Part 2: Developing Malware

Part 3: Debugging and Mitigation

PE Files #

Headers #

Directories #

Export Directory #

Import Directory #

Sections #

Windows API #