x86


Data types

Byte

8-bit value

Word

16-bit value

Dword

Double word; 32-bit value

Qword

Quad word; 64-bit value

Intel operators

0x

Prefix denoting hexadecimal address

[]

Indirect address the address within the brackets

,

Separates source and destination

AT&T operators

Instructions

mov

Move; copy or load data to a destination

movabs

push

Push; copy register's value onto stack

pop

Pop; pop value off stack into register

lea

Load effective address; load address pointing to data

add

Add

sub

Subtract

inc

Increment

dec

Decrement

imul

Integer multiplication

idiv

Integer division

and

AND

or

OR

xor

XOR

not

NOT

neg

Negate

shl

Shift left

shr

Shift right

jmp

Jump; jumps to label

jcondition

Jump on condition; jumps to a subroutine on a condition

cmp

Compare; compare source and register with subtraction and set the rflags accordingly

call

Call; calls a procedure

ret

Return; returns from procedure

xchg

Exchange; swap data between registers

adc

Add with carry

loop

Loop; loops to the beggining of the function and decrements rcx

syscall

Calls a predefined function provided by the OS. To set up the function and parameters, registers are set with the following information:

db

Define byte

dw

Define word

dd

Define double word

dq

Define quad word

Addressing modes

Implied

Data for an opcode/instruction is implied and therefore not necessary to provide

Immediate

Data of a specified value

LDA #$40 #Loads 64 into A

Register

Data of a specified register

Direct

Data of an address

LDA $1966 #Loads value at address 6502 into A

Indirect

The data of a 2 address pointer (in little endian format, that is low byte is smaller address)

LDA #$40 STA $1966 #Loads 64 into address 6502 LDA #$00 STA $1967 #Loads 0 into address 6503 LDA ($1966) #Loads value of $0040 into A

Stack

Registers

rax eax ax al

Accumulator; register that is used for input and output of Arithmetic Logic Unit (ALU; the CPU component that performs operands)

rbx ebx bx bl

Base; register that is used for indexing operations. It is

rcx ecx cx cl

Counter; register that is used for rotating instructions and counting loops

rdx edx dx dl

Data; register that is used for input and output of ALU and multiplication and division of large values

r8 r8d r8w r8b

r9 r9d r9w r9b

r10 r10d r10w r10b

rip eip ip ipl

Instruction pointer; register that points to the address of the next instruction being executed

rsp esp sp spl

Stack pointer; bit register that points to the top of the stack

rbp ebp bp bpl

Base pointer; bit register that points to the bottom of the stack

rsi esi si sil

Source index; register that points to the head of a string, used for copying

rdi edi di dil

Destination index; register that points to the end of a string, used for copying

rflags eflags flags

Register that contains flags of the state of the CPU

Status bits

Carry Flag

Toggled when: ADC overclocks, no borrow with SBC or CMP, or manually set or cleared with SEC or CLC respectively. It also takes the form of the bit shifted out in a ASL, LSR, ROL or ROR.

Zero Flag

Set when the result of an instruction is zero

Interrupt Disable Flag

When set, interrupts other than the NMI are prohibited.

Decimal

On some machines, initiates binary-coded decimal representation for easier decimal representation.

B Flag

Unknown

Overflow Flag

Is set with ADC and SBC when a value is added or subtracted in a way that changes its sign (like adding #$7F and #$01, or #$FF and #$01).

Negative Flag

Represents the seventh bit of the variable in context (as the seventh bit states whether a signed bit is negative).

Function convention

General considerations

Prologue

Act of pushing registers onto stack at the beggining of a function

Epilogue

Act of popping registers off stack at the end of a function

Stack alignment

Before calling a function, we push the rbp onto the stack to remember the old stackframe base and then we ensure that that old stackframe has a height of a multiple of 16 or in Math, \(16k|k \in \mathbb{N}\) (so translate the rsp down until the stack grows to a multiple of 16)

Compiler labels

label

Syscalls

Linux has an API canned syscalls that allows for linux commands to be executed in x86

0 read

1 write

2 open

3 close

60 exit

Libc

A library of C functions can be called within x86

Buffer overflows

Von Neumann Architecture

ALU

CU

Memory

CU

Control unit; used to mediate values from memory and ALU as well as decode and execute an instruction

ALU

Arithmetic Logic Unit; used to derive calculations from values fed by the CU, using registers for temporary storage

CPU

Central Processing Unit; the hardware in computer responsible for all arithmetic, controlling RAM and and general processing

CPU architecture

ISA

Instruction Set Arctitecture

x86 Syntax

Instruction Cycle

Instruction Cycle

Buffer anatomy

PE

Portable Executable; windows binary file seen in .exe, .dll, .sys files and so forth

ELF

Executable and Linking Format; unix binary file where an assembly file is compiled to opcodes and then it is linked by resolving variable names to addresses

Canary

Specific known values written in certain addresses of memory so that if the addresses were overwritten, a buffer overflow would be detected

DEP

Data Execution Prevention; system used to detect unauthorised memory access and terminate program

ASLR

Address SPace Layout Randomisation; randomisation of the starting location of the stack, heap, base etc. to make aiming the EIP to a desired address more difficult

LIFO

Last In First Out; the latest pushed object is the object that is returned on a pull, a pipeline follows this convention

FIFO

First In First Out; the most recently pushed object is the object that is returned on a pull, a stack follows this convention

Stack frame

Partition of stack for a certain function

NOP slide

Overflowing with NOPs so that when you aim the EIP at your exploit in memory, it has a larger target, since if the EIP is aimed just a bit before the explot the NOP allows it to slide without executing misinterpreted instructions that could crash the system

Controlling EIP

To controll the EIP, we overflow with an array of hex characters with a pattern, observe what overflowed values are in the EIP, find at what index those characters start in the array, and now you know the amount of characters to overflow to arrive at the EIP

Bad characters

Characters that confuses the compiler, often because the characters are used for the file's magic number

Intro to Assembly

ISA

Instruction Set Architecture

Cache

Responsible for holding frequently used memory closer to CPU for more speed

L1

Type of cache that is fast but small in terms of storage

L2

Type of cache

L3

Type of cache that is slow but large in terms of storage

Endianess

The order in which high and low address bytes are read

Bit significance

Bit with least significance is the bit that determines \(2^{0}\) (the right-most bit), while the bit with the most significance is the bit that determines \(2^{n}\) where \(n\) is the bit architecture of the system (the left-most bit)

Real mode

Operating mode for comupers that have access to 1MB of RAM as 20-bit values

Compiling

Act of translating a high-level language into an assembly binary file

Interpreting

Act of executing a program without translating the file into assembly