I’m want to make code that reads assembly instructions (x86 only) and recreates them in other place of memory in order to make hook code. Like, I want to hook function X so I need to patch its (at least) first bytes with a jump and every instruction that I replace (what can vary according the assembly code) (partially or not) I need to recreate in a memory block of mine and then add an instruction to jump back to the original function X from the offset of the next instruction that I didn’t touch. You probably know what I’m saying since it isn’t new for many. I don’t want to make a complete perfect program but I want to make a fully extensible code base that would use a tree like I will explain below. To begin let’s imagine some instructions:
- A – “0x12 0x13 . . ” this instruction has 4 bytes and the first two are static.
- B – “0x12 . ” this instruction has 2 bytes and the first one is static.
For this case I would have a tree that would look like
Tree
|
|
0x12
/ \
B 0x13
|
A
So when the code were to parse an instruction it would try to reach the instruction with the longest prefix and if it failed could stop and fail or try one above in the tree.
The reasoning to wanting to make something like this is that I can extend later with instructions provided by dlls that is a must for what I’m doing because I want to ship the code sooner that will handle like 90% of instructions and only take care of those more advanced in case I need in the future.
So, now my question is: what is the exact full information that a dll that would handle a code instruction would need?
Like:
- the address where the instruction starts. (a must of course)
- ? the base address of the module that contains the address where the instruction starts (I suppose this one is need in case that the instruction references some portion of the memory of its module)
- ? a previous instruction. Don’t know if there are instructions that need to know what the instruction before it did or something like that
I also want to ask if the tree structure is ok or if there is some problem I will have.
So, basically I want to ask you for help deciding what is the information I need to create the most generic possible code that:
given an address, parses its assembly instructions and according to the instruction will call function pointers in dlls that will copy those instructions.
So, having something like
void* copy_instructions(void* address,int& len)
{
int bytes_copied = 0;
void* instructions = block of bytes // don't care about the implementation
do
{
void (*copy_instruction)(void*,int*) = get_a_handler_to_instruction_at(address) // this function will use the tree structure and retrieve a function from a dll
if(copy_instruction != NULL)
int len = 0;
void* instruction = copy_instruction(void* address,&len,...) // I want to know how to make this function complete in terms of what it need for every case
if(!instruction)
fail
instructions += instruction // don't care about the implementation
address += len
bytes_copied += len
else
fail
}
while(bytes_copied < 5)
add_instructions_jump_to(instructions,address + bytes_copied)
len = bytes_copied;
return
}
My questions would be:
How would a complete “copy_instruction” function header look like?
Is the tree mentioned above ok to implement “get_a_handler_to_instruction_at” or I need something else.
In order to hook a function you’ll need to:
Jcc,JMP near,CALL near,JMP/CALL qword ptr [RIP+something],MOV EAX, dword ptr [RIP+something]) and, if this is so, the target address.Jcc) or even 16-bit) which are insufficiently short for simple direct patching. In this case you will need to reassemble such instructions with longer relative addresses (this will require either inserting/changing an instruction prefix or changing the Mod/RM/SIB bytes). Keep in mind that the relative addresses are relative to the instruction’s end (or, IOW, beginning of the next instruction), which means if the adjusted instruction is longer than the original, the relative address will have to account for the instruction length difference as well. Ideally, you should also be prepared to handle the case when the original instructions, which you overwrite, jmp to one another. You don’t want their copies to jump back to the overwritten code.JMPinstruction that jumps to the first untouched (by overwriting) instruction in the original function.After this in most situations hooking should just work. The problems will arise if there’s any other code generated by the compiler that expects the original instructions at their original place and unchanged.
As for the data structure, you replace
Nbytes of the original code.Nis 5 for a 32-bit jump. ThoseNbytes will correspond to at mostNoriginal instructions. You’ll need to save those 1 to N instructions in their entirety (every instruction is at most 15-bytes long, IIRC), then parse, possibly adjust and store in the new place. You don’t really need a tree here, an array would suffice. An element per instruction. Simple. But it’s quite some code that needs to be carefully written and debugged/tested.Please see the related questions. There may be valuable details.
EDIT: Answering the main question:
I think, the main function to “copy” all instructions (copy_instructions()) may indeed be defined as you’ve defined it. You may want to return an error code from it, though, in case it fails (to allocate memory or disassemble unknown instruction or something else). It may be helpful. I can’t see what else you’d need from/for the caller.