I’m want to make code that reads assembly instructions (x86 only) and recreates them

Question

0

Asked: May 29, 20262026-05-29T17:19:13+00:00 2026-05-29T17:19:13+00:00

I’m want to make code that reads assembly instructions (x86 only) and recreates them

0

I’m want to make code that reads assembly instructions (x86 only) and recreates them in other place of memory in order to make hook code. Like, I want to hook function X so I need to patch its (at least) first bytes with a jump and every instruction that I replace (what can vary according the assembly code) (partially or not) I need to recreate in a memory block of mine and then add an instruction to jump back to the original function X from the offset of the next instruction that I didn’t touch. You probably know what I’m saying since it isn’t new for many. I don’t want to make a complete perfect program but I want to make a fully extensible code base that would use a tree like I will explain below. To begin let’s imagine some instructions:

A – “0x12 0x13 . . ” this instruction has 4 bytes and the first two are static.
B – “0x12 . ” this instruction has 2 bytes and the first one is static.

For this case I would have a tree that would look like

So when the code were to parse an instruction it would try to reach the instruction with the longest prefix and if it failed could stop and fail or try one above in the tree.

The reasoning to wanting to make something like this is that I can extend later with instructions provided by dlls that is a must for what I’m doing because I want to ship the code sooner that will handle like 90% of instructions and only take care of those more advanced in case I need in the future.

So, now my question is: what is the exact full information that a dll that would handle a code instruction would need?
Like:

the address where the instruction starts. (a must of course)
? the base address of the module that contains the address where the instruction starts (I suppose this one is need in case that the instruction references some portion of the memory of its module)
? a previous instruction. Don’t know if there are instructions that need to know what the instruction before it did or something like that

I also want to ask if the tree structure is ok or if there is some problem I will have.

So, basically I want to ask you for help deciding what is the information I need to create the most generic possible code that:

given an address, parses its assembly instructions and according to the instruction will call function pointers in dlls that will copy those instructions.

So, having something like

void* copy_instructions(void* address,int& len)
{
    int bytes_copied = 0;
    void* instructions = block of bytes // don't care about the implementation

    do
    { 
        void (*copy_instruction)(void*,int*) = get_a_handler_to_instruction_at(address) // this function will use the tree structure and retrieve a function from a dll

        if(copy_instruction != NULL)

            int len = 0;
            void* instruction = copy_instruction(void* address,&len,...) // I want to know how to make this function complete in terms of what it need for every case

            if(!instruction)
                fail

            instructions += instruction // don't care about the implementation

            address += len
            bytes_copied += len
        else
                fail
    }
    while(bytes_copied < 5)

    add_instructions_jump_to(instructions,address + bytes_copied)

    len = bytes_copied;

    return 
}

My questions would be:

How would a complete “copy_instruction” function header look like?
Is the tree mentioned above ok to implement “get_a_handler_to_instruction_at” or I need something else.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T17:19:16+00:00

In order to hook a function you’ll need to:

Know its interface (calling convention, parameter number and types, all that). The compiler may inline the function or fool around the interface when optimizing code. If this is the case, I don’t know how to best handle it. You might need to tweak the code so the function is called via a volatile pointer to a function, trying to convince the compiler that the pointer may change its value at any time and point to any other function with the same parameters and it would be unwise to change the function’s prologue and epilogue. Disabling optimization may be another option. All this is needed to avoid the situation when the original and new functions aren’t compatible in terms of how they receive parameters and return. However, if this is one of the exported functions, the compiler obviously won’t change anything as it would break code.
Know its address.
Minimally disassemble the first instructions of the function, which you are going to overwrite with the jump instruction to your new code. When disassemblying you must find out: the instruction length (for this you’ll need to correctly parse all instruction prefixes, all opcode bytes, all Mod/Rm/SIB bytes, all displacement and all immediate operand bytes; some logic + look-up tables will help), whether this instruction transfers control to or accesses data at a location relative to the instruction pointer (e.g. Jcc, JMP near, CALL near, JMP/CALL qword ptr [RIP+something], MOV EAX, dword ptr [RIP+something]) and, if this is so, the target address.
Know the address of the copies of the original instructions. Ideally, you’d allocate memory for the copies after parsing the instructions, but you can (and probably should) preallocate more to simplify your life.
Copy the original instructions to the new place and if necessary adjust the relative address in them by the difference between the old and new location of these instructions. Note that, the original instructions may use very short relative addresses in them (e.g. 8-bit (the most common case for Jcc) or even 16-bit) which are insufficiently short for simple direct patching. In this case you will need to reassemble such instructions with longer relative addresses (this will require either inserting/changing an instruction prefix or changing the Mod/RM/SIB bytes). Keep in mind that the relative addresses are relative to the instruction’s end (or, IOW, beginning of the next instruction), which means if the adjusted instruction is longer than the original, the relative address will have to account for the instruction length difference as well. Ideally, you should also be prepared to handle the case when the original instructions, which you overwrite, jmp to one another. You don’t want their copies to jump back to the overwritten code.
Add a JMP instruction that jumps to the first untouched (by overwriting) instruction in the original function.

After this in most situations hooking should just work. The problems will arise if there’s any other code generated by the compiler that expects the original instructions at their original place and unchanged.

As for the data structure, you replace N bytes of the original code. N is 5 for a 32-bit jump. Those N bytes will correspond to at most N original instructions. You’ll need to save those 1 to N instructions in their entirety (every instruction is at most 15-bytes long, IIRC), then parse, possibly adjust and store in the new place. You don’t really need a tree here, an array would suffice. An element per instruction. Simple. But it’s quite some code that needs to be carefully written and debugged/tested.

Please see the related questions. There may be valuable details.

EDIT: Answering the main question:

I think, the main function to “copy” all instructions (copy_instructions()) may indeed be defined as you’ve defined it. You may want to return an error code from it, though, in case it fails (to allocate memory or disassemble unknown instruction or something else). It may be helpful. I can’t see what else you’d need from/for the caller.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m want to make code that reads assembly instructions (x86 only) and recreates them

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply