EBPIG
6̽Ë÷ÔÓÖ¾6
MHJDQ
֪ʶ¹²ÏíJ×ÊÔ´¹²ÏíJ×ÊÁϹ²Ïí
¡¾·¢ÐÐʱ¼ä¡¿2000-10-21
¡¾ÆÚ¿¯ºÅÂë¡¿µÚÊ®ÆßÆÚ
¡¾ÍøÕ¾µØÖ·¡¿http://programhunter.com
¡¾°æȨÉùÃ÷¡¿
´ËÔÓÖ¾ÓɳÌʽÁÔÈ˱༭¡¢ÖÆ×÷¼°·¢ÐУ»ÔÓÖ¾¿ÉÒÔ×ÔÓÉתÔØ¡¢·Ö·¢ºÍ´«²¥£»ÈκθöÈË»òÍÅÌå²»µÃÔÚδ¾­±¾ÈËÊÚȨµÄÇé¿öÏÂÐÞ¸ÄÔÓÖ¾µÄÍâ¹Û¼°ÄÚÈÝ£»ÔÓÖ¾µÄ½âÊÍȨ¹é³ÌʽÁÔÈËËùÓС£

¡¾±à¼­¼ÄÓï¡¿

    
   {~._.~} 
    ( Y )  
   ()~*~() 
   (_)-(_) 
ÿÏÖÔÚÑо¿ÆƽâµÄÈËÒ»¶¨¾­³£Óöµ½ÍѿǵÄÎÊÌ⣬¶øÏÖÔÚ·ÀÆƽâµÄ×îºÃµÄÒ»¸ö·½·¨¾ÍÊǶԳÌÐò¼Ó¿Ç£¬Ò»ÊÇ¿ÉÒÔ¼õÉÙ³ÌÐòµÄÌå»ý£¬×îÖØÒªµÄÊÇ¿ÉÒÔ·ÀÖ¹ÆƽâµÄÈËÐ޸ijÌÐò£¬ËùÒÔÏÖÔÚÔ½À´Ô½¶àµÄ³ÌÐòÔ±¿ªÊ¼½«×Ô¼ºµÄ³ÌÐò¼Ó¿ÇÁË¡£ÄÇô¶ÔÓÚÎÒÃÇÕâЩ½âÃܵÄÈËÀ´Ëµ¾ÍÒ»¶¨ÒªÑо¿ÈçºÎÍÑ¿ÇÁË¡£ÄÇô½ñÌìÕâ¸ö½Ì³Ì¾ÍÊÇÈçºÎÍѿǵĻù´¡ÎÄÕ¡£ËüÊǽéÉÜPEµÄ»ù±¾ÖªÊ¶µÄ½Ì³Ì¡£Ëü¹²Æßƪ£¬·Ö¶þ´ÎÏò´ó¼Ò½éÉÜ£¬ÎÒÒ²»áÔÚÖ®ºó½«ÖÐÎĵÄÄÚÈÝÍƳöÀ´¡£Ï£Íû´ó¼ÒÏÈÑо¿Õâ¸öÓ¢Îĵİɡ£ºÃÁË£¬¸÷λÈÊÐÖŬÁ¦°É¡£
 
¡¾Ä¿ ÿÿ ¼¡¿
ÿÿÿÿ&ÆƽâÐĵÃ
J¡­¡­Tutorial 1: Overview of PE file format Iczelion
K¡­¡­Tutorial 2: Detecting a Valid PE File Iczelion
L¡­¡­Tutorial 3: File Header Iczelion
ÿÿÿÿ%³õѧÌìµØ
ÿÿÿÿOÎÊÌâ´ðÒÉ
ÿÿÿÿ4ÍøÕ¾½éÉÜ
ÿÿÿÿ,ÔÓÖ¾ÐÅÏä
 
&¡¾ÆƽâÐĵá¿

Tutorial 1: Overview of PE file format

This is the complete rewrite of the old PE tutorial no1 which I considered the worst tutorial I have ever written. So I decided to replace it with this new one.

PE stands for Portable Executable. It's the native file format of Win32. Its specification is derived somewhat from the Unix Coff (common object file format). The meaning of "portable executable" is that the file format is universal across win32 platform: the PE loader of every win32 platform recognizes and uses this file format even when Windows is running on CPU platforms other than Intel. It doesn't mean your PE executables would be able to port to other CPU platforms without change. Every win32 executable (except VxDs and 16-bit Dlls) uses PE file format. Even NT's kernel mode drivers use PE file format. Thus studying the PE file format gives you valuable insights into the structure of Windows.

Let's jump into the general outline of PE file format without further ado.

DOS MZ header
DOS stub
PE header
Section table
Section 1
Section 2
Section ...
Section n

The above picture is the general layout of a PE file. All PE files (even 32-bit DLLs) must start with a simple DOS MZ header. We usually aren't interested in this structure much. It's provided in the case when the program is run from DOS, so DOS can recognize it as a valid executable and can thus run the DOS stub which is stored next to the MZ header. The DOS stub is actually a valid EXE that is executed in case the operating system doesn't know about PE file format. It can simply display a string like "This program requires Windows" or it can be a full-blown DOS program depending on the intent of the programmer. We are also not very interested in DOS stub: it's usually provided by the assembler/compiler. In most case, it simply uses int 21h, service 9 to print a string saying "This program cannot run in DOS mode".

After the DOS stub comes the PE header. The PE header is a general term for the PE-related structure named IMAGE_NT_HEADERS. This structure contains many essential fields that are used by the PE loader. We will be quite familiar with it as you know more about PE file format. In the case the program is executed in the operating system that knows about PE file format, the PE loader can find the starting offset of the PE header from the DOS MZ header. Thus it can skip the DOS stub and go directly to the PE header which is the real file header.

The real content of the PE file is divided into blocks called sections. A section is nothing more than a block of data with common attributes such as code/data, read/write etc. You can think of a PE file as a logical disk. The PE header is the boot sector and the sections are files in the disk. The files can have different attributes such as read-only, system, hidden, archive and so on. I want to make it clear from this point onwards that the grouping of data into a section is done on the common attribute basis: not on logical basis. It doesn't matter how the code/data are used , if the data/code in the PE file have the same attribute, they can be lumped together in a section. You should not think of a section as "data", "code" or some other logical concepts: sections can contain both code and data provided that they have the same attribute. If you have a block of data that you want to be read-only, you can put that data in the section that is marked as read-only. When the PE loader maps the sections into memory, it examines the attributes of the sections and gives the memory block occupied by the sections the indicated attributes.

If we view the PE file format as a logical disk, the PE header as the boot sector and the sections as files, we still don't have enough information to find out where the files reside on the disk, ie. we haven't discussed the directory equivalent of the PE file format. Immediately following the PE header is the section table which is an array of structures. Each structure contains the information about each section in the PE file such as its attribute, the file offset, virtual offset. If there are 5 sections in the PE file, there will be exactly 5 members in this structure array. We can then view the section table as the root directory of the logical disk. Each member of the array is equvalent to the each directory entry in the root directory.

That's all about the physical layout of the PE file format. I'll summarize the major steps in loading a PE file into memory below:

  1. When the PE file is run, the PE loader examines the DOS MZ header for the offset of the PE header. If found, it skips to the PE header.
  2. The PE loader checks if the PE header is valid. If so, it goes to the end of the PE header.
  3. Immediately following the PE header is the section table. The PE header reads information about the sections and maps those sections into memory using file mapping. It also gives each section the attributes as specified in the section table.
  4. After the PE file is mapped into memory, the PE loader concerns itself with the logical parts of the PE file, such as the import table.

The above steps are oversimplification and are based on my own observation. There may be some inaccuracies but it should give you the clear picture of the process.

You should download LUEVELSMEYER's description about PE file format. It's very detailed and you should keep it as a reference.


[Iczelion's Win32 Assembly Homepage]

Tutorial 2: Detecting a Valid PE File

In this tutorial, we will learn how to check if a given file is a valid PE file.
Download the example.

Theory:

How can you verify if a given file is a PE file? That question is difficult to answer. That depends on the length that you want to go to do that. You can verify every data structure defined in the PE file format or you are satisfied with verifying only the crucial ones. Most of the time, it's pretty pointless to verify every single structure in the files. If the crucial structures are valid, we can assume that the file is a valid PE. And we will use that assumption.

The essential structure we will verify is the PE header itself. So we need to know a little about it, programmatically. The PE header is actually a structure called IMAGE_NT_HEADERS. It has the following definition:

IMAGE_NT_HEADERS STRUCT
   Signature dd ?
   FileHeader IMAGE_FILE_HEADER <>
   OptionalHeader IMAGE_OPTIONAL_HEADER32 <>
IMAGE_NT_HEADERS ENDS

Signature is a dword that contains the value 50h, 45h, 00h, 00h. In more human term, it contains the text "PE" followed by two terminating zeroes. This member is the PE signature so we will use it in verifying if a given file is a valid PE one.
FileHeader is a structure that contains information about the physical layout of the PE file such as the number of sections, the machine the file is targeted and so on.
OptionalHeader is a structure that contains information about the logical layout of the PE file. Despite the "Optional" in its name, it's always present.

Our goal is now clear. If value of the signature member of the IMAGE_NT_HEADERS is equal to "PE" followed by two zeroes, then the file is a valid PE. In fact, for comparison purpose, Microsoft has defined a constant named IMAGE_NT_SIGNATURE which we can readily use.

IMAGE_DOS_SIGNATURE equ 5A4Dh
IMAGE_OS2_SIGNATURE equ 454Eh
IMAGE_OS2_SIGNATURE_LE equ 454Ch
IMAGE_VXD_SIGNATURE equ 454Ch
IMAGE_NT_SIGNATURE equ 4550h

The next question: how can we know where the PE header is? The answer is simple: the DOS MZ header contains the file offset of the PE header. The DOS MZ header is defined as structure. You can check it out in windows.inc. The e_lfanew member of the I structure contains the file offset of the PE header.

The steps are now as follows:

  1. Verify if the given file has a valid DOS MZ header by comparing the first word of the file with the value IMAGE_DOS_SIGNATURE.
  2. If the file has a valid DOS header, use the value in e_lfanew member to find the PE header
  3. Comparing the first word of the PE header with the value IMAGE_NT_HEADER. If both values match, then we can assume that the file is a valid PE.

Example:

.386
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\comdlg32.inc
include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\comdlg32.lib

SEH struct
PrevLink dd ?    ; the address of the previous seh structure
CurrentHandler dd ?    ; the address of the exception handler
SafeOffset dd ?    ; The offset where it's safe to continue execution
PrevEsp dd ?      ; the old value in esp
PrevEbp dd ?     ; The old value in ebp
SEH ends

.data
AppName db "PE tutorial no.2",0
ofn OPENFILENAME <>
FilterString db "Executable Files (*.exe, *.dll)",0,"*.exe;*.dll",0
                 db "All Files",0,"*.*",0,0
FileOpenError db "Cannot open the file for reading",0
FileOpenMappingError db "Cannot open the file for memory mapping",0
FileMappingError db "Cannot map the file into memory",0
FileValidPE db "This file is a valid PE",0
FileInValidPE db "This file is not a valid PE",0

.data?
buffer db 512 dup(?)
hFile dd ?
hMapping dd ?
pMapping dd ?
ValidPE dd ?

.code
start proc
LOCAL seh:SEH
mov ofn.lStructSize,SIZEOF ofn
mov ofn.lpstrFilter, OFFSET FilterString
mov ofn.lpstrFile, OFFSET buffer
mov ofn.nMaxFile,512
mov ofn.Flags, OFN_FILEMUSTEXIST or OFN_PATHMUSTEXIST or OFN_LONGNAMES or OFN_EXPLORER or OFN_HIDEREADONLY
invoke GetOpenFileName, ADDR ofn
.if eax==TRUE
    invoke CreateFile, addr buffer, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL
    .if eax!=INVALID_HANDLE_VALUE
       mov hFile, eax
       invoke CreateFileMapping, hFile, NULL, PAGE_READONLY,0,0,0
       .if eax!=NULL
          mov hMapping, eax
          invoke MapViewOfFile,hMapping,FILE_MAP_READ,0,0,0
          .if eax!=NULL
             mov pMapping,eax
             assume fs:nothing
             push fs:[0]
             pop seh.PrevLink
             mov seh.CurrentHandler,offset SEHHandler
             mov seh.SafeOffset,offset FinalExit
             lea eax,seh
             mov fs:[0], eax
             mov seh.PrevEsp,esp
             mov seh.PrevEbp,ebp
             mov edi, pMapping
             assume edi:ptr IMAGE_DOS_HEADER
             .if [edi].e_magic==IMAGE_DOS_SIGNATURE
                add edi, [edi].e_lfanew
                assume edi:ptr IMAGE_NT_HEADERS
                .if [edi].Signature==IMAGE_NT_SIGNATURE
                   mov ValidPE, TRUE
                .else
                   mov ValidPE, FALSE
                .endif
             .else
                 mov ValidPE,FALSE
             .endif
FinalExit:
             .if ValidPE==TRUE
                 invoke MessageBox, 0, addr FileValidPE, addr AppName, MB_OK+MB_ICONINFORMATION
             .else
                invoke MessageBox, 0, addr FileInValidPE, addr AppName, MB_OK+MB_ICONINFORMATION
             .endif
             push seh.PrevLink
             pop fs:[0]
             invoke UnmapViewOfFile, pMapping
          .else
             invoke MessageBox, 0, addr FileMappingError, addr AppName, MB_OK+MB_ICONERROR
          .endif
          invoke CloseHandle,hMapping
       .else
          invoke MessageBox, 0, addr FileOpenMappingError, addr AppName, MB_OK+MB_ICONERROR
       .endif
       invoke CloseHandle, hFile
    .else
       invoke MessageBox, 0, addr FileOpenError, addr AppName, MB_OK+MB_ICONERROR
    .endif
.endif
invoke ExitProcess, 0
start endp

SEHHandler proc C uses edx pExcept:DWORD, pFrame:DWORD, pContext:DWORD, pDispatch:DWORD
    mov edx,pFrame
    assume edx:ptr SEH
    mov eax,pContext
    assume eax:ptr CONTEXT
    push [edx].SafeOffset
    pop [eax].regEip
    push [edx].PrevEsp
    pop [eax].regEsp
    push [edx].PrevEbp
    pop [eax].regEbp
    mov ValidPE, FALSE
    mov eax,ExceptionContinueExecution
    ret
SEHHandler endp
end start

Analysis:

The program opens a file and checks if the DOS header is valid, if it is, it checks the PE header if it's valid. If it is, then it assumes the file is a valid PE. In this example, I use structured exception handling (SEH) so that we don't have to check for every possible error: if a fault occurs, we assume that it's because the file is not a valid PE thus giving our program wrong information. Windows itself uses SEH heavily in its parameter validation routines. If you're interested in SEH, read the article by Jeremy Gordon.

The program displays an open file common dialog to the user and when the user chooses an executable file, it opens the file and maps it into memory. Before it goes on with the verification, it sets up a SEH:

   assume fs:nothing
   push fs:[0]
   pop seh.PrevLink
   mov seh.CurrentHandler,offset SEHHandler
   mov seh.SafeOffset,offset FinalExit
   lea eax,seh
   mov fs:[0], eax
   mov seh.PrevEsp,esp
   mov seh.PrevEbp,ebp

We start by assuming the use of fs register as nothing. This must be done because MASM assumes the use of fs register to ERROR. Next we store the address of the previous SEH handler in our structure for use by Windows. We store the address of our SEH handler, the address where the execution can safely resume if a fault occurs, the current values of esp and ebp so that our SEH handler can get the state of the stack back to normal before it resumes the execution of our program.

   mov edi, pMapping
   assume edi:ptr IMAGE_DOS_HEADER
   .if [edi].e_magic==IMAGE_DOS_SIGNATURE

After we are done with setting up SEH, we continue with the verification. We put the address of the first byte of the target file in edi, which is the first byte of the DOS header. For ease of comparison, we tell the assembler that it can assume edi as pointing to the IMAGE_DOS_HEADER structure (which is the truth). We then compare the first word of the DOS header with the string "MZ" which is defined as a constant in windows.inc named IMAGE_DOS_SIGNATURE. If the comparison is ok, we continue to the PE header. If not, we set the value in ValidPE to FALSE, meaning that the file is not a valid PE.

      add edi, [edi].e_lfanew
      assume edi:ptr IMAGE_NT_HEADERS
      .if [edi].Signature==IMAGE_NT_SIGNATURE
         mov ValidPE, TRUE
      .else
         mov ValidPE, FALSE
      .endif

To get to the PE header, we need the value in e_lfanew of the DOS header. This field contains the file offset of the PE header, relative to the file beginning. Thus we add this value to edi and we get to the first byte of the PE header. It's this place that a fault may occur. If the file is really not a PE file, the value in e_lfanew will be incorrect and thus using it amounts to using a wild pointer. If we don't use SEH, we must check the value of the e_lfanew against the file size which is ugly. If all goes well, we compare the first dword of the PE header with the string "PE". Again there is a handy constant named IMAGE_NT_SIGNATURE which we can use. If the result of comparison is true, we assume the file is a valid PE.
If the value in e_lfanew is incorrect, a fault may occur and our SEH handler will get control. It simply restores the stack pointer, bsae pointer and resumes the execution at the safe offset which is at the FinalExit label.

FinalExit:
   .if ValidPE==TRUE
      invoke MessageBox, 0, addr FileValidPE, addr AppName, MB_OK+MB_ICONINFORMATION
   .else
      invoke MessageBox, 0, addr FileInValidPE, addr AppName, MB_OK+MB_ICONINFORMATION
   .endif

The above code is simplicity itself. It checks the value in ValidPE and displays a message to the user accordingly.

   push seh.PrevLink
   pop fs:[0]

When the SEH is no longer used, we dissociate it from the SEH chain.


[Iczelion's Win32 Assembly Homepage]

Tutorial 3: File Header

In this tutorial, we will study the file header portion of the PE header.

Let's summarize what we have learned so far:

  • DOS MZ header is called. Only two of its members are important to us: e_magic which contains the string "MZ" and e_lfanew which contains the file offset of the PE header.
  • We use the value in e_magic to check if the file has a valid DOS header by comparing it to the value . If both values match, we can assume that the file has a valid DOS header.
  • In order to go to the PE header, we must move the file pointer to the offset specified by the value in e_lfanew.
  • The first dword of the PE header should contain the string "PE" followed by two zeroes. We compare the value in this dword to the value . If they match, then we can assume that the PE header is valid.

We will learn more about the PE header in this tutorial. The official name of the PE header is I. To refresh your memory, I show it below.

IMAGE_NT_HEADERS STRUCT
    Signature dd ?
    FileHeader IMAGE_FILE_HEADER <>
    OptionalHeader IMAGE_OPTIONAL_HEADER32 <>
IMAGE_NT_HEADERS ENDS

Signature is the PE signature, "PE" followed by two zeroes. You already know and use this member.
FileHeader is a structure that contains the information about the physical layout/properies of the PE file in general.
OptionalHeader is also a structure that contains the information about the logical layout inside the PE file.

The most interesting information is in OptionalHeader. However, some fields in FileHeader are also important. We will learn about FileHeader in this tutorial so we can move to study OptionalHeader in the next tutorials.

IMAGE_FILE_HEADER STRUCT
    Machine WORD ?
    NumberOfSections WORD ?
    TimeDateStamp dd ?
    PointerToSymbolTable dd ?
    NumberOfSymbols dd ?
    SizeOfOptionalHeader WORD ?
    Characteristics WORD ?
IMAGE_FILE_HEADER ENDS

Field name Meanings
Machine The CPU platform the file is intended for. For Intel platform, the value is IMAGE_FILE_MACHINE_I386 (14Ch). I tried to use 14Dh and 14Eh as stated in the pe.txt by LUEVELSMEYER but Windows refused to run it. This field is rarely of interest to us except as a quick way of preventing a program to be executed.
NumberOfSections The number of sections in the file. We will need to modify the value in this member if we add or delete a section from the file.
TimeDateStamp The date and time the file is created. Not useful to us.
PointerToSymbolTable used for debugging.
NumberOfSymbols used for debugging.
SizeOfOptionalHeader The size of the OptionalHeader member that immediately follows this structure. Must be set to a valid value.
Characteristics Contains flags for the file, such as whether this file is an exe or a dll.

In summary, only three members are somewhat useful to us: Machine, NumberOfSections and Characteristics. You would normally not change the values of Machine and Characteristics but you must use the value in NumberOfSections when you're walking the section table.
I'm jumping the gun here but in order to illustrate the use of NumberOfSections, I need to digress briefly to the section table.

The section table is an array of structures. Each structure contains the information of a section. Thus if there are 3 sections, there will be 3 members in this array. You need the value in NumberOfSections so you know how many members there are in the array. You would think that checking for the structure with all zeroes in its members would help. Windows does use this approach. You can verify this fact by setting the value in NumberOfSections to a value higher than the real value and Windows still runs the file without problem. From my observation, I think Windows reads the value in NumberOfSections and examines each structure in the section table. If it finds a structure that contains all zeroes, it terminates the search. Else it would process until the number of structures specified in NumberOfSections is met. Why can't we ignore the value in NumberOfSections? Several reasons. The PE specification doesn't specify that the section table array must end with an all-zero structure. Thus there may be a situation where the last array member is contiguous to the first section, without empty space at all. Another reason has to do with bound imports. The new-style binding puts the information immediately following the section table's last structure array member. Thus you still need NumberOfSections.


[Iczelion's Win32 Assembly Homepage]

%¡¾³õѧÌìµØ¡¿
                 
O¡¾ÎÊÌâ´ðÒÉ¡¿
 
4¡¾ÍøÕ¾½éÉÜ¡¿
 
 
,¡¾ÔÓÖ¾ÐÅÏä¡¿
Ͷ¸åÐÅÏ䣺discoveredit@china.com
´ðÒÉÐÅÏ䣺discoveranswer@china.com
°ßÖñÐÅÏ䣺programhunter@china.com