Tutorial
1: Overview of PE file format
This is the complete rewrite
of the old PE tutorial no1 which I considered the worst tutorial
I have ever written. So I decided to replace it with this new one.
PE stands for Portable Executable.
It's the native file format of Win32. Its specification is derived
somewhat from the Unix Coff (common object file format). The meaning
of "portable executable" is that the file format is universal
across win32 platform: the PE loader of every win32 platform recognizes
and uses this file format even when Windows is running on CPU platforms
other than Intel. It doesn't mean your PE executables would be able
to port to other CPU platforms without change. Every win32 executable
(except VxDs and 16-bit Dlls) uses PE file format. Even NT's kernel
mode drivers use PE file format. Thus studying the PE file format
gives you valuable insights into the structure of Windows.
Let's jump into the general
outline of PE file format without further ado.
DOS
MZ header
|
DOS
stub
|
PE
header
|
Section
table
|
Section
1
|
Section
2
|
Section
...
|
Section
n
|
The above picture is the general
layout of a PE file. All PE files (even 32-bit DLLs) must start
with a simple DOS MZ header. We usually aren't interested in this
structure much. It's provided in the case when the program is run
from DOS, so DOS can recognize it as a valid executable and can
thus run the DOS stub which is stored next to the MZ header. The
DOS stub is actually a valid EXE that is executed in case the operating
system doesn't know about PE file format. It can simply display
a string like "This program requires Windows" or it can
be a full-blown DOS program depending on the intent of the programmer.
We are also not very interested in DOS stub: it's usually provided
by the assembler/compiler. In most case, it simply uses int 21h,
service 9 to print a string saying "This program cannot run
in DOS mode".
After the DOS stub comes
the PE header. The
PE header is a general term for the PE-related structure named
IMAGE_NT_HEADERS. This structure
contains many essential fields that are used by the PE loader. We
will be quite familiar with it as you know more about PE file format.
In the case the program is executed in the operating system that
knows about PE file format, the PE loader can find the starting
offset of the PE header from the DOS MZ header. Thus it can skip
the DOS stub and go directly to the PE header which is the real
file header.
The real content of the PE
file is divided into blocks called sections.
A section is nothing more than a block of data with common attributes
such as code/data, read/write etc. You can think of a PE file as
a logical disk. The PE header is the boot sector and the sections
are files in the disk. The files can have different attributes such
as read-only, system, hidden, archive and so on.
I want to make it clear from this point onwards that the grouping
of data into a section is done on the common attribute basis: not
on logical basis. It doesn't matter how the code/data
are used , if the data/code in the PE file have the same attribute,
they can be lumped together in a section. You should not think of
a section as "data", "code" or some other logical
concepts: sections can contain both code and data provided that
they have the same attribute. If you have a block of data that you
want to be read-only, you can put that data in the section that
is marked as read-only. When the PE loader maps the sections into
memory, it examines the attributes of the sections and gives the
memory block occupied by the sections the indicated attributes.
If we view the PE file format
as a logical disk, the PE header as the boot sector and the sections
as files, we still don't have enough information to find out where
the files reside on the disk, ie. we haven't discussed the directory
equivalent of the PE file format. Immediately following the PE header
is the section table which is
an array of structures. Each structure contains the information
about each section in the PE file such as its attribute, the file
offset, virtual offset. If there are 5 sections in the PE file,
there will be exactly 5 members in this structure array. We can
then view the section table as the root directory of the logical
disk. Each member of the array is equvalent to the each directory
entry in the root directory.
That's all about the physical
layout of the PE file format. I'll summarize the major steps in
loading a PE file into memory below:
- When the PE file is run,
the PE loader examines the DOS MZ header for the offset of the
PE header. If found, it skips to the PE header.
- The PE loader checks if
the PE header is valid. If so, it goes to the end of the PE header.
- Immediately following the
PE header is the section table. The PE header reads information
about the sections and maps those sections into memory using file
mapping. It also gives each section the attributes as specified
in the section table.
- After the PE file is mapped
into memory, the PE loader concerns itself with the logical parts
of the PE file, such as the import table.
The above steps are oversimplification
and are based on my own observation. There may be some inaccuracies
but it should give you the clear picture of the process.
You should download LUEVELSMEYER's
description about PE file format. It's very detailed and
you should keep it as a reference.
[Iczelion's
Win32 Assembly Homepage]
|
Tutorial
2: Detecting a Valid PE File
In this tutorial, we will
learn how to check if a given file is a valid PE file.
Download the
example.
Theory:
How can you verify if a given
file is a PE file? That question is difficult to answer. That depends
on the length that you want to go to do that. You can verify every
data structure defined in the PE file format or you are satisfied
with verifying only the crucial ones. Most of the time, it's pretty
pointless to verify every single structure in the files. If the
crucial structures are valid, we can assume that the file is a valid
PE. And we will use that assumption.
The essential structure we
will verify is the PE header itself. So we need to know a little
about it, programmatically. The PE header is actually a structure
called IMAGE_NT_HEADERS. It has the following definition:
IMAGE_NT_HEADERS STRUCT
Signature dd ?
FileHeader IMAGE_FILE_HEADER <>
OptionalHeader IMAGE_OPTIONAL_HEADER32 <>
IMAGE_NT_HEADERS ENDS
Signature
is a dword that contains the value 50h, 45h, 00h, 00h. In more human
term, it contains the text "PE" followed by two terminating
zeroes. This member is the PE signature so we will use it in verifying
if a given file is a valid PE one.
FileHeader
is a structure that contains information about the physical
layout of the PE file such as the number of sections, the machine
the file is targeted and so on.
OptionalHeader
is a structure that contains information about the logical layout
of the PE file. Despite the "Optional" in its name, it's
always present.
Our goal is now clear. If
value of the signature member of the IMAGE_NT_HEADERS
is equal to "PE" followed by two zeroes, then the file
is a valid PE. In fact, for comparison purpose, Microsoft has defined
a constant named IMAGE_NT_SIGNATURE
which we can readily use.
IMAGE_DOS_SIGNATURE
equ 5A4Dh
IMAGE_OS2_SIGNATURE equ 454Eh
IMAGE_OS2_SIGNATURE_LE equ 454Ch
IMAGE_VXD_SIGNATURE equ 454Ch
IMAGE_NT_SIGNATURE equ 4550h
The next question: how can
we know where the PE header is? The answer is simple: the DOS MZ
header contains the file offset of the PE header. The DOS MZ header
is defined as structure. You
can check it out in windows.inc. The e_lfanew
member of the I structure contains
the file offset of the PE header.
The steps are now as follows:
- Verify if the given file
has a valid DOS MZ header by comparing the first word of the file
with the value IMAGE_DOS_SIGNATURE.
- If the file has a valid
DOS header, use the value in e_lfanew member to find the PE header
- Comparing the first word
of the PE header with the value IMAGE_NT_HEADER.
If both values match, then we can assume that the file is a valid
PE.
Example:
.386
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\comdlg32.inc
include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\comdlg32.lib
SEH struct
PrevLink dd ? ; the address of the previous seh
structure
CurrentHandler dd ? ; the address of the exception
handler
SafeOffset dd ? ; The offset where it's safe to
continue execution
PrevEsp dd ? ; the old value in esp
PrevEbp dd ? ; The old value in ebp
SEH ends
.data
AppName db "PE tutorial no.2",0
ofn OPENFILENAME <>
FilterString db "Executable Files (*.exe, *.dll)",0,"*.exe;*.dll",0
db "All Files",0,"*.*",0,0
FileOpenError db "Cannot open the file for reading",0
FileOpenMappingError db "Cannot open the file for memory mapping",0
FileMappingError db "Cannot map the file into memory",0
FileValidPE db "This file is a valid PE",0
FileInValidPE db "This file is not a valid PE",0
.data?
buffer db 512 dup(?)
hFile dd ?
hMapping dd ?
pMapping dd ?
ValidPE dd ?
.code
start proc
LOCAL seh:SEH
mov ofn.lStructSize,SIZEOF ofn
mov ofn.lpstrFilter, OFFSET FilterString
mov ofn.lpstrFile, OFFSET buffer
mov ofn.nMaxFile,512
mov ofn.Flags, OFN_FILEMUSTEXIST or OFN_PATHMUSTEXIST or OFN_LONGNAMES
or OFN_EXPLORER or OFN_HIDEREADONLY
invoke GetOpenFileName, ADDR ofn
.if eax==TRUE
invoke CreateFile, addr buffer, GENERIC_READ,
FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL
.if eax!=INVALID_HANDLE_VALUE
mov hFile, eax
invoke CreateFileMapping, hFile,
NULL, PAGE_READONLY,0,0,0
.if eax!=NULL
mov hMapping,
eax
invoke MapViewOfFile,hMapping,FILE_MAP_READ,0,0,0
.if eax!=NULL
mov pMapping,eax
assume fs:nothing
push fs:[0]
pop seh.PrevLink
mov seh.CurrentHandler,offset SEHHandler
mov seh.SafeOffset,offset FinalExit
lea eax,seh
mov fs:[0], eax
mov seh.PrevEsp,esp
mov seh.PrevEbp,ebp
mov edi, pMapping
assume edi:ptr IMAGE_DOS_HEADER
.if [edi].e_magic==IMAGE_DOS_SIGNATURE
add edi, [edi].e_lfanew
assume edi:ptr IMAGE_NT_HEADERS
.if [edi].Signature==IMAGE_NT_SIGNATURE
mov ValidPE, TRUE
.else
mov ValidPE, FALSE
.endif
.else
mov ValidPE,FALSE
.endif
FinalExit:
.if ValidPE==TRUE
invoke MessageBox, 0, addr FileValidPE, addr AppName, MB_OK+MB_ICONINFORMATION
.else
invoke MessageBox, 0, addr FileInValidPE, addr AppName, MB_OK+MB_ICONINFORMATION
.endif
push seh.PrevLink
pop fs:[0]
invoke UnmapViewOfFile, pMapping
.else
invoke MessageBox, 0, addr FileMappingError, addr AppName, MB_OK+MB_ICONERROR
.endif
invoke CloseHandle,hMapping
.else
invoke MessageBox,
0, addr FileOpenMappingError, addr AppName, MB_OK+MB_ICONERROR
.endif
invoke CloseHandle, hFile
.else
invoke MessageBox, 0, addr
FileOpenError, addr AppName, MB_OK+MB_ICONERROR
.endif
.endif
invoke ExitProcess, 0
start endp
SEHHandler proc C uses edx pExcept:DWORD, pFrame:DWORD, pContext:DWORD,
pDispatch:DWORD
mov edx,pFrame
assume edx:ptr SEH
mov eax,pContext
assume eax:ptr CONTEXT
push [edx].SafeOffset
pop [eax].regEip
push [edx].PrevEsp
pop [eax].regEsp
push [edx].PrevEbp
pop [eax].regEbp
mov ValidPE, FALSE
mov eax,ExceptionContinueExecution
ret
SEHHandler endp
end start
Analysis:
The program opens a file and
checks if the DOS header is valid, if it is, it checks the PE header
if it's valid. If it is, then it assumes the file is a valid PE.
In this example, I use structured exception handling (SEH) so that
we don't have to check for every possible error: if a fault occurs,
we assume that it's because the file is not a valid PE thus giving
our program wrong information. Windows itself uses SEH heavily in
its parameter validation routines. If you're interested in SEH,
read the
article by Jeremy Gordon.
The program displays an open
file common dialog to the user and when the user chooses an executable
file, it opens the file and maps it into memory. Before it goes
on with the verification, it sets up a SEH:
assume fs:nothing
push fs:[0]
pop seh.PrevLink
mov seh.CurrentHandler,offset SEHHandler
mov seh.SafeOffset,offset FinalExit
lea eax,seh
mov fs:[0], eax
mov seh.PrevEsp,esp
mov seh.PrevEbp,ebp
We start by assuming the use
of fs register as nothing. This must be done because MASM assumes
the use of fs register to ERROR. Next we store the address of the
previous SEH handler in our structure for use by Windows. We store
the address of our SEH handler, the address where the execution
can safely resume if a fault occurs, the current values of esp and
ebp so that our SEH handler can get the state of the stack back
to normal before it resumes the execution of our program.
mov edi, pMapping
assume edi:ptr IMAGE_DOS_HEADER
.if [edi].e_magic==IMAGE_DOS_SIGNATURE
After we are done with setting
up SEH, we continue with the verification. We put the address of
the first byte of the target file in edi, which is the first byte
of the DOS header. For ease of comparison, we tell the assembler
that it can assume edi as pointing to the
IMAGE_DOS_HEADER structure (which is the truth). We then
compare the first word of the DOS header with the string "MZ"
which is defined as a constant in windows.inc named IMAGE_DOS_SIGNATURE.
If the comparison is ok, we continue to the PE header. If not, we
set the value in ValidPE to
FALSE, meaning that the file is not a valid PE.
add
edi, [edi].e_lfanew
assume edi:ptr IMAGE_NT_HEADERS
.if [edi].Signature==IMAGE_NT_SIGNATURE
mov ValidPE,
TRUE
.else
mov ValidPE,
FALSE
.endif
To get to the PE header, we
need the value in e_lfanew
of the DOS header. This field contains the file offset of the PE
header, relative to the file beginning. Thus we add this value to
edi and we get to the first byte of the PE header. It's this place
that a fault may occur. If the file is really not a PE file, the
value in e_lfanew will be incorrect and thus using it amounts to
using a wild pointer. If we don't use SEH, we must check the value
of the e_lfanew against the
file size which is ugly. If all goes well, we compare the first
dword of the PE header with the string "PE". Again there
is a handy constant named IMAGE_NT_SIGNATURE
which we can use. If the result of comparison is true, we assume
the file is a valid PE.
If the value in e_lfanew
is incorrect, a fault may occur and our SEH handler will get control.
It simply restores the stack pointer, bsae pointer and resumes the
execution at the safe offset which is at the FinalExit label.
FinalExit:
.if ValidPE==TRUE
invoke MessageBox, 0, addr FileValidPE,
addr AppName, MB_OK+MB_ICONINFORMATION
.else
invoke MessageBox, 0, addr FileInValidPE,
addr AppName, MB_OK+MB_ICONINFORMATION
.endif
The above code is simplicity
itself. It checks the value in ValidPE and displays a message to
the user accordingly.
push seh.PrevLink
pop fs:[0]
When the SEH is no longer
used, we dissociate it from the SEH chain.
[Iczelion's
Win32 Assembly Homepage]
|