[Solved] How is shellcode generated from C? – With code example

Question

The problem with creating shellcode from C programs is not that you cannot control the assembly generated, nor something related with the code generation.
The problem with creating shellcode from C programs is symbols resolution or relocation, call it whatever you like.

You approach, for what I have understand, is right, you are just using the wrong code or in a different view, you want too much.
I’m not going to explain how the loading of an image works but briefly when you use a function like write the assembly generated is a call ADDRESS instruction but the address is not compiled yet, it is just a relative offset that will be resolved by the loader at run time with the help of the structures found on the image (see PE, ELF).
Your shell code don’t get loaded by the OS (it is a program inside a program) and so its symbols are not resolved. Look at this:

Unresolved symbol

This is the call to the write function in your hello.c as it is executed by stepping with GDB. Note that the call is at 28cbfch and that the callee should be at 28cbfd,i.e. just a byte after the start of the instruction. This is not possible as the instruction itself takes 5 bytes, this means that the call to write is coded as call -4, i.e. a relative address not yet resolved by the loader.

You will learn in your course that shell code usually use system calls directly in Linux with int 80h on IA32 platform. If you substitute the call to write with a system call invocation your shell code should (may be other issues, don’t trust me) work.

Fun fact: I was expecting the stack to not be executable (look for NX bit for more info) by default but in cygwin it was indeed. You can use --execstack to be sure, for ELF file, that the stack is executable.

Accepted Answer

The problem with creating shellcode from C programs is not that you cannot control the assembly generated, nor something related with the code generation.
The problem with creating shellcode from C programs is symbols resolution or relocation, call it whatever you like.

You approach, for what I have understand, is right, you are just using the wrong code or in a different view, you want too much.
I’m not going to explain how the loading of an image works but briefly when you use a function like write the assembly generated is a call ADDRESS instruction but the address is not compiled yet, it is just a relative offset that will be resolved by the loader at run time with the help of the structures found on the image (see PE, ELF).
Your shell code don’t get loaded by the OS (it is a program inside a program) and so its symbols are not resolved. Look at this:

Unresolved symbol

This is the call to the write function in your hello.c as it is executed by stepping with GDB. Note that the call is at 28cbfch and that the callee should be at 28cbfd,i.e. just a byte after the start of the instruction. This is not possible as the instruction itself takes 5 bytes, this means that the call to write is coded as call -4, i.e. a relative address not yet resolved by the loader.

You will learn in your course that shell code usually use system calls directly in Linux with int 80h on IA32 platform. If you substitute the call to write with a system call invocation your shell code should (may be other issues, don’t trust me) work.

Fun fact: I was expecting the stack to not be executable (look for NX bit for more info) by default but in cygwin it was indeed. You can use --execstack to be sure, for ELF file, that the stack is executable.