Binary Executable Files

Lecture Notes

Analyzing Binary Executable files such as ELF. Focus on structure and Symbol linking.

Download here

Practical Tasks

Exercise 1

Compile a small C program with 1-2 function, and using objdump, analyze the output of object created as well as the final binary.

As a possible C program you may consider:

#include <stdio.h>
#include <fcntl.h>

void bar(void) {
        printf("Hello");
}

void foo(void){
        printf("World");
        int fd = open("a.txt", O_RDONLY);
        close(fd);
}

int main(int argc, char **argv){
        foo();
        return 0;
}

After the program is compiled (gcc -o prog prog.c) you can use objdump to inspect it. To obtain the sections you can use objdump -h prog and the result should be something like:

objdump -h prog

prog:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .interp       0000001c  00000000000002a8  00000000000002a8  000002a8  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.ABI-tag 00000020  00000000000002c4  00000000000002c4  000002c4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .note.gnu.build-id 00000024  00000000000002e4  00000000000002e4  000002e4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .gnu.hash     00000024  0000000000000308  0000000000000308  00000308  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .dynsym       000000d8  0000000000000330  0000000000000330  00000330  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .dynstr       0000008f  0000000000000408  0000000000000408  00000408  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .gnu.version  00000012  0000000000000498  0000000000000498  00000498  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .gnu.version_r 00000020  00000000000004b0  00000000000004b0  000004b0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .rela.dyn     000000c0  00000000000004d0  00000000000004d0  000004d0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .rela.plt     00000048  0000000000000590  0000000000000590  00000590  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 10 .init         00000017  0000000000001000  0000000000001000  00001000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .plt          00000040  0000000000001020  0000000000001020  00001020  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .plt.got      00000008  0000000000001060  0000000000001060  00001060  2**3

...

You can also obtain the symbols present in the binary using objdump -tT prog, which will show the symbol table and the dynamic symbol table.

prog:     file format elf64-x86-64

SYMBOL TABLE:
00000000000002a8 l    d  .interp        0000000000000000              .interp
00000000000002c4 l    d  .note.ABI-tag  0000000000000000              .note.ABI-tag
00000000000002e4 l    d  .note.gnu.build-id     0000000000000000              .note.gnu.build-id
0000000000000308 l    d  .gnu.hash      0000000000000000              .gnu.hash
0000000000000330 l    d  .dynsym        0000000000000000              .dynsym
0000000000000408 l    d  .dynstr        0000000000000000              .dynstr
0000000000000498 l    d  .gnu.version   0000000000000000              .gnu.version
00000000000004b0 l    d  .gnu.version_r 0000000000000000              .gnu.version_r
00000000000004d0 l    d  .rela.dyn      0000000000000000              .rela.dyn
0000000000000590 l    d  .rela.plt      0000000000000000              .rela.plt
...

The full contents of this tool are omitted. Run it and determine:

  • How many sections are present?
  • How many symbols are present?

Strip the binary and repeat the same process with objdump. Then compare both results. In particular, answer:

  • What happened to symbols?
  • What happened to function names, and functions?
  • What happened to the file size?

Exercise 2

Compile a small C program with 1-2 functions and external libraries. As an example, you can consider a program that creates a thread (libpthread) or compresses a file (libz). Any other function is adequate, as long as they are from external libraries.

One example is present at the zlib repository: https://raw.githubusercontent.com/madler/zlib/master/test/example.c You can compile this code with gcc -o example example.c -lz.

Using a Hex editor, identify the magic values of an ELF, and the values of its header. You can use readelf to guide you by presenting the values that you can find in the hex editor.

$ readelf -h example

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x1270
  Start of program headers:          64 (bytes into file)
  Start of section headers:          20368 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         13
  Size of section headers:           64 (bytes)
  Number of section headers:         31
  Section header string table index: 30

Using readelf, process the file, and identify the main sections (readelf -S) and its content. The following snippet show the content for section 25, which is the .data section. It will contain global tables and global variables.

$ readelf -x 25 example

Hex dump of section '.data':
  0x00005130 00000000 00000000 38510000 00000000 ........8Q......
  0x00005140 68656c6c 6f2c2068 656c6c6f 21000000 hello, hello!...
  0x00005150 5a310000 00000000                   Z1......

Inspect the .plt jump table and the .got offset table. You can actually disassemble the .plt section with objdump -M intel -d. The output will show that for each symbol, there is some code to resolve the function. The first entry will be related to the generic code for relocation, while the next entries will contain code specific for each symbol.

Disassembly of section .plt:

0000000000001020 <gzclose@plt-0x10>:
    1020:       ff 35 e2 3f 00 00       push   QWORD PTR [rip+0x3fe2]        # 5008 <_GLOBAL_OFFSET_TABLE_+0x8>
    1026:       ff 25 e4 3f 00 00       jmp    QWORD PTR [rip+0x3fe4]        # 5010 <_GLOBAL_OFFSET_TABLE_+0x10>
    102c:       0f 1f 40 00             nop    DWORD PTR [rax+0x0]

0000000000001030 <gzclose@plt>:
    1030:       ff 25 e2 3f 00 00       jmp    QWORD PTR [rip+0x3fe2]        # 5018 <gzclose@Base>
    1036:       68 00 00 00 00          push   0x0
    103b:       e9 e0 ff ff ff          jmp    1020 <_init+0x20>

0000000000001040 <free@plt>:
    1040:       ff 25 da 3f 00 00       jmp    QWORD PTR [rip+0x3fda]        # 5020 <free@GLIBC_2.2.5>
    1046:       68 01 00 00 00          push   0x1
    104b:       e9 d0 ff ff ff          jmp    1020 <_init+0x20>

In this case, the .got will be at rip+0x3fe2. If the actual value of the function is found, the instruction pointer will jump to that address. Otherwise, it jumps back to the .plt, a value is pushed to the stack (an index), an then the generic resolver is called.

Create a diagram (drawing) of the binary file and represent its structure from the perspective of the ELF structure, a segment view, and a section view. It is important to understand which parts of the ELF file are actually loaded into segments, and where they will be placed in the memory. The structure is important to analyze to see how the bytes map to segments and sections.

Exercise 3

The LIEF library allows extensive manipulation of binary files, including ELF objects. Using LIEF, make a small python script that prints information about an ELF, that may be relevant for future reverse engineering tasks.

In particular:

  • The type of file and architecture
  • The list of libraries loaded
  • The compiler used
  • The list of symbols from external libraries
  • The address of the program entry point
  • Information whether the program is using RELRO, PIE, and Canaries
  • Information whether the program is stripped

Exercise 4

An important feature of dynamic analysis is the interception, redirection, and even modification of symbols. This can be easily achieved using the LD_PRELOAD flag for the dynamic linker.

The following snippet allows us to override any function with a custom implementation, and call the original function (or just forbid its execution). In this situation we will use LD_PRELOAD=libover.so prog, where libover will contain this code, while the prog is a standard program under analysis.

void (*original_foo)(void) = NULL;

void foo() { // Function to override
    if (original_foo == NULL) { // First time execution: load the real address
    	original_foo = dlsym(RTLD_NEXT, "foo");
    }

    printf("foo entry\n");
    original_foo();  // call original function.
    printf("foo exit\n");
}

To compile it use: gcc -o libover.so -shared -fPIC libover.c -dl.

Taking this as an example, write a library to intercept communications with secure sockets, printing the contents before they are encrypted. Test the library with an application such as wget. For wget, you can dump the list of dynamic symbols using objdump to look for potential symbols to override.

$ objdump -T /usr/bin/wget |grep gnutls
...
0000000000000000      DF *UND*  0000000000000000 (GNUTLS_3_4) gnutls_record_recv
...
0000000000000000      DF *UND*  0000000000000000 (GNUTLS_3_4) gnutls_certificate_verify_peers2
...
0000000000000000      DF *UND*  0000000000000000 (GNUTLS_3_4) gnutls_record_send

Three symbols are interesting as they may allow to bypass certificate validation, inspect data sent or data received.

Previous
Next