Exploiting Format Strings.

Format Strings

Format strings are control parameters passed to print functions to modify strings.  In Hacking the Art of Exploitation we learn how to use these parameters for our own purposes.

Some format string options in C are:

%s          : string parameter.
%d or %i    : integer parameter.
%c          : character paramter.
%n          : number of characters written so far.
%x          : displays an unsigned integer in hexadecimal format
.
.

There are a lot of them so this is not a comprehensive list. Usually when you look at a basic C programming tutorial you will see the first three types that I’ve listed.  We have used the last one extensively when printing out memory addresses.  You can find a comprehensive list of the parameters here.

Lets See What Happens.
Here is the code for a format string example program from HTAE.

//A program to demonstrate format strings
#include <stdio.h>
#include <stdlib.h>

int main(void) {
    int A = 5, B = 7, count_one, count_two;

    //Example of a %n format string
    printf("The number of bytes written up to this point X%n is being stored in count_one,\n
            and the number of bytes up to here X%n is being stored in count_two.\n", 
            &count_one, &count_two);

    printf("count_one: %d\n", count_one);
    printf("count_two: %d\n", count_two);

    //stack example
    printf("A is %d and is at %08x.  B is %x.\n", A, &A, B);

    exit(0);
}

There really isn’t anything crazy going on here.  We are just displaying information using the format string parameters.

$ ./fmt_uncommon
The number of bytes written up to this point X is being stored in count_one,
 and the number of bytes up to here X is being stored in count_two.
count_one: 46
count_two: 114
A is 5 and is at bffff2a8.  B is 7.

Something to point out is that we have the same number of arguments in printf as we do parameters.  What happens when we skip a parameter?


The number of bytes written up to this point X is being stored in count_one,
and the number of bytes up to here X is being stored in count_two.
count_one: 46
count_two: 114
A is 5 and is at bffff2c8. B is 8048502.The number of bytes written up to this point X 
is being stored in count_one,
and the number of bytes up to here X is being stored in count_two.
count_one: 46
count_two: 114
A is 5 and is at bffff2c8. B is 8048502.

The value in bold is just data from the stack where the third argument should have been. To understand where that value comes from we just need to examine how the stack behaves when we call the function printf(). The arguments are pushed onto the stack from last to first. So B is pushed on first, then &A, then A, finally the format string function address. So the function pulled data from where the third argument should be based on the stack frame pointer. This whole discussion is in HTAE in the Format String section.

Lets take a look at the assembly of our program with all of the arguments in place.

$ gdb -q ./fmt_uncommon
Reading symbols from ./fmt_uncommon...done.
(gdb) disass /m main
Dump of assembler code for function main:
5       int main(void) {
   0x0804842b <+0>:     lea    ecx,[esp+0x4]
   0x0804842f <+4>:     and    esp,0xfffffff0
   0x08048432 <+7>:     push   DWORD PTR [ecx-0x4]
   0x08048435 <+10>:    push   ebp
   0x08048436 <+11>:    mov    ebp,esp
   0x08048438 <+13>:    push   ecx
   0x08048439 <+14>:    sub    esp,0x14

6           int A = 5, B = 7, count_one, count_two;
   0x0804843c <+17>:    mov    DWORD PTR [ebp-0x10],0x5
   0x08048443 <+24>:    mov    DWORD PTR [ebp-0xc],0x7

7
8           //Example of a %n format string
9           printf("The number of bytes written up to this point X%n is being stored in
           count_one,\n and the number of bytes up to here X%n is being stored in count_two.\n"
           , &count_one, &count_two);
   0x0804844a <+31>:    sub    esp,0x4
   0x0804844d <+34>:    lea    eax,[ebp-0x18]
   0x08048450 <+37>:    push   eax
   0x08048451 <+38>:    lea    eax,[ebp-0x14]
   0x08048454 <+41>:    push   eax
   0x08048455 <+42>:    push   0x8048540
   0x0804845a <+47>:    call   0x80482f0 <printf@plt>
   0x0804845f <+52>:    add    esp,0x10

10
11          printf("count_one: %d\n", count_one);
   0x08048462 <+55>:    mov    eax,DWORD PTR [ebp-0x14]
   0x08048465 <+58>:    sub    esp,0x8
   0x08048468 <+61>:    push   eax
   0x08048469 <+62>:    push   0x80485d6
   0x0804846e <+67>:    call   0x80482f0 <printf@plt>
   0x08048473 <+72>:    add    esp,0x10

12          printf("count_two: %d\n", count_two);
   0x08048476 <+75>:    mov    eax,DWORD PTR [ebp-0x18]
   0x08048479 <+78>:    sub    esp,0x8
   0x0804847c <+81>:    push   eax
   0x0804847d <+82>:    push   0x80485e5
   0x08048482 <+87>:    call   0x80482f0 <printf@plt>
   0x08048487 <+92>:    add    esp,0x10

---Type  to continue, or q  to quit---
13
14          //stack example
15          printf("A is %d and is at %08x.  B is %x.\n", A, &A, B);
   0x0804848a <+95>:    mov    eax,DWORD PTR [ebp-0x10]
   0x0804848d <+98>:    push   DWORD PTR [ebp-0xc]
   0x08048490 <+101>:   lea    edx,[ebp-0x10]
   0x08048493 <+104>:   push   edx
   0x08048494 <+105>:   push   eax
   0x08048495 <+106>:   push   0x80485f4
   0x0804849a <+111>:   call   0x80482f0 <printf@plt>
   0x0804849f <+116>:   add    esp,0x10

16
17          exit(0);
   0x080484a2 <+119>:   sub    esp,0xc
   0x080484a5 <+122>:   push   0x0
   0x080484a7 <+124>:   call   0x8048310 <exit@plt>

End of assembler dump.

Lets look at what’s happening here in some detail.  We should recognize the lea, load effective address, instruction at the beginning of the program.  We are loading the address of the esp register plus four.  Then we are zeroing out all but the least significant digit of the esp value.  Then we push a double word ptr, which is 32 bits, from the address at ecx – 0x4 onto the stack.  Then we push the contents of ebp onto the stack.  Remember that ebp is the base pointer.  So we are copying the current base pointer value onto the stack.  Next we copy the value of esp to ebp.  So we are moving the ebp register to esp.  Next we push the value of ecx onto the stack and then subtract 0x14 from the esp pointer.  This whole structure is the program setting up the stack frame to run the program.

Remember that the stack grows downward.  So that subtraction from the esp pointer is extending our stack making room for information.

Next we see the storage of the variables A and B.  We see the value of B is moved 4 bytes below the value of A.  We can keep going but all of this should be starting to make sense and be identifiable to us.  What we are really interested in is the format string calls.  Lets take a look at the first printf call.

Format String Assembly.

In the first call to printf we are storing the number of characters in two memory addresses.  Looking directly under line 9 we are able to see the process.  We extend our stack by subtracting 0x4.  Then we load the address of [ebp – 0x18] into eax and push it onto the stack.  We do the same thing again for the second call at [ebp – 0x14] and push that value onto the stack.  Remember that we are loading the arguments last first.  So that the first value pushed onto the stack is going to be counter_two at [ebp – 0x18].  Then we push another value onto the stack, then we see the call to the printf function.  The other printf calls exhibit the same type of structures.  So lets compare with the program that only passes two arguments when three are needed.

Here is the modified program assembly dump.

(gdb) disass /m main
Dump of assembler code for function main:
5       int main(void) {
   0x0804842b <+0>:     lea    ecx,[esp+0x4]
   0x0804842f <+4>:     and    esp,0xfffffff0
   0x08048432 <+7>:     push   DWORD PTR [ecx-0x4]
   0x08048435 <+10>:    push   ebp
   0x08048436 <+11>:    mov    ebp,esp
   0x08048438 <+13>:    push   ecx
   0x08048439 <+14>:    sub    esp,0x14

6           int A = 5, B = 7, count_one, count_two;
   0x0804843c <+17>:    mov    DWORD PTR [ebp-0x10],0x5
   0x08048443 <+24>:    mov    DWORD PTR [ebp-0xc],0x7

7
8           //Example of a %n format string
9           printf("The number of bytes written up to this point X%n is being stored in 
count_one,\n and the number of bytes up to here X%n is being stored in count_two.\n",
 &count_one, &count_two);
   0x0804844a <+31>:    sub    esp,0x4
   0x0804844d <+34>:    lea    eax,[ebp-0x18]
   0x08048450 <+37>:    push   eax
   0x08048451 <+38>:    lea    eax,[ebp-0x14]
   0x08048454 <+41>:    push   eax
   0x08048455 <+42>:    push   0x8048540
   0x0804845a <+47>:    call   0x80482f0 <printf@plt>
   0x0804845f <+52>:    add    esp,0x10

10
11          printf("count_one: %d\n", count_one);
   0x08048462 <+55>:    mov    eax,DWORD PTR [ebp-0x14]
   0x08048465 <+58>:    sub    esp,0x8
   0x08048468 <+61>:    push   eax
   0x08048469 <+62>:    push   0x80485d6
   0x0804846e <+67>:    call   0x80482f0 <printf@plt>
   0x08048473 <+72>:    add    esp,0x10

12          printf("count_two: %d\n", count_two);
   0x08048476 <+75>:    mov    eax,DWORD PTR [ebp-0x18]
   0x08048479 <+78>:    sub    esp,0x8
   0x0804847c <+81>:    push   eax
   0x0804847d <+82>:    push   0x80485e5
   0x08048482 <+87>:    call   0x80482f0 <printf@plt>
   0x08048487 <+92>:    add    esp,0x10

---Type  to continue, or q  to quit---
13
14          //stack example
15          printf("A is %d and is at %08x.  B is %x.\n", A, &A);
   0x0804848a <+95>:    mov    eax,DWORD PTR [ebp-0x10]
   0x0804848d <+98>:    sub    esp,0x4
   0x08048490 <+101>:   lea    edx,[ebp-0x10]
   0x08048493 <+104>:   push   edx
   0x08048494 <+105>:   push   eax
   0x08048495 <+106>:   push   0x80485f4
   0x0804849a <+111>:   call   0x80482f0 <printf@plt>
   0x0804849f <+116>:   add    esp,0x10

16
17          exit(0);
   0x080484a2 <+119>:   sub    esp,0xc
   0x080484a5 <+122>:   push   0x0
   0x080484a7 <+124>:   call   0x8048310 <exit@plt>

End of assembler dump.

We’ve removed the last argument for the printf function in line 15.  Comparing the assembly dumps here we should be reading the value at [ebp – 0xc] pushed onto the stack, but we didn’t push that value onto the stack because we didn’t call it.

Conclusion

We have seen how format strings work in C and examined their operation in assembly.  Now that we have a view of what’s happening we can look at exploiting them.  As we are moving forward once we look at how something works we should start asking ourselves what can go wrong with this structure.

We see a similar situation to a buffer overflow here.  We have access to memory in the stack that we aren’t necessarily supposed to have.  What would be something to do with that?  What if we could get the format string parameter to point to what we want it to?  Can we change 804502 to something that will jump eip to where we control execution?

Next time we will take a look at the format string exploit example from HTAE.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s