Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3275634
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T19:09:58+00:00 2026-05-17T19:09:58+00:00

How does a x86 disassembler know where to break up the instructions? I am

  • 0

How does a x86 disassembler know where to break up the instructions?

I am looking at the 8088 instruction set. For example the move instruction has 7 variations that range from 2 to 4 bytes. The instructions themselves seem to follow no particular order. Another reason for Why is x86 ugly?.

For example:

                        76543210  76543210  76543210  76543210
reg/mem to/from reg     100010dw  ||regr/m  
imm to reg/mem          1100011w  ||000r/m  dat       dat w=1
imm to reg              1011wreg  data      dat w=1
imm to accum            1010000w  addr-low  addrhigh
accum to mem            1010001w  addr-low  addrhigh
reg/mem to seg          10001100  ||0ssr/m
seg to reg/mem          10001100  ||0ssr/m

Legend:
||=mod {NO-DISP=0,DISP-LOW,DISP-HIGH,REG}
ss=seg enum{es=0,cs,ss,ds}
reg=enum{ax=0,bx,cd,dx,bx,sp,bp,si,di (if w=1)} enum{al,bl...} (if w=0)
r/m=reg or mem (mod=3 then REG, else mem)

many instruction can overlap in the first byte:

                        76543210  76543210  76543210  76543210
push                    11111111  ||110r/m
inc                     1111111w  ||000r/m

The bitmasks appears to have arbitrary assignment. How does a disassembler break apart the instructions?

This question is a sub set of How to write a disassembler.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T19:09:58+00:00Added an answer on May 17, 2026 at 7:09 pm

    Looking at my 8086/8088 Users Manual Programmers reference (ISBN 1-55512-010-5), likely decades out of print…Appendix A shows the instruction decoding in opcode order 0b00000000 thru 0b11111111. Does not appear to be chaotic at all. Add, sub, and, xor, cmp, etc are all grouped in such a way that a mux can use the opcode bits directly to route the inputs and outputs, and other bits select the operation the alu performs on those bits.

    For writing a disassembler you want to use this kind of table or an opcode chart for the top level sorting of instructions.

    In your particular example, notice how whenever you see the first opcode as 0xFF there are three bits in the middle of the second byte that tell you the rest of the story as to which instruction is which. All 8 of those combinations (one is undefined) are represented and easily decoded from those 3 bits.

    Yes, the x86 instruction set is crazy. Interesting and fun features, but considerably better instruction sets have been invented since. The only reason x86 has not gone the way of the 6502 for example is momentum, not quality.

    You should look at this one too:

    How are hex sequence translated to assembly without ambiguity?

    How to disassemble this and any other variable word length instruction set is by doing it in execution order. You will fail if you try to do it linearly in address order. Start with the vector table to get the entry addresses then follow those instructions in address order, making a note of and following all branches until you hit an unconditional branch or return or other instruction that terminates that string of instructions. Repeat this for every branch destination. That wont cover all of the instructions possible as the code may compute addresses while executing (not much you can do about disassembling that).

    If any of this code was hand written intentionally or accidentally to trip up a disassembler you can expect to have collisions where the second or third byte of one opcode based on one execution path appears to be the first opcode of an instruction based on a different execution path. For example a clear a flag instruction followed by a conditional branch if flag is clear, followed by a byte of data, followed by a real instruction that is a branch destination. Yep, I have come across this. And it should be trapped by your disassembler, you need to put checks in to stop disassembling one or both of those execution paths when they collide. For complete disassembly expect to have to support some sort of user input to exclude addresses as opcodes, as well as for the user to manually add valid opcodes for you to follow the execution path from.

    For fixed length instruction sets you can easily disassemble in address or execution order, your choice, address order from 0 to the end of memory is the easiest of course. Dont error out on undefined instructions, just mark them as such and keep going, some of those are data.

    x86 is definitely the LAST variable length instruction set I would attempt to disassemble and I have written many disassemblers. No desire to ever attempt that project. Start with some fixed length ones like the pic and arm/thumb. Try the msp430 for variable word length, then maybe the 6502 (asteroids, asteroids deluxe, lunar lander, etc). Maybe a week or two worth of evenings to cover the above and get the feel for it, then attack the x86 if the desire remains. If you limit yourself strictly to the 8088/8086 it is not so bad, need to make sure your tools are generating those instructions and not getting into the 386 on up instructions.

    If push vs inc is bothering you, definitely try something else like the msp430 for example first.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Does anyone know a good tool that would give me x86 instruction execution count.
Does anyone know if windows 7 will somehow defer back to the ProgramFiles(X86) environmental
does anyone have any resources for learning assembly language on x86? I'm trying to
Does the Java language have delegate features, similar to how C# has support for
Does anybody know any good resources for learning how to program CIL with in-depth
Does anyone know how to get IntelliSense to work reliably when working in C/C++
Does anyone have any recommendations of tools that can be of assistance with moving
Does anyone use have a good regex library that they like to use? Most
Does anyone know how to transform a enum value to a human readable value?
How does one implement alloca() using inline x86 assembler in languages like D, C,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.