I am having trouble finding a good place to start learning assembly. I have found lots of conflicting information throughout the internet as to what assembly actually is, which assemblers to use, what an assembler is, and whether there is one “core” assembly language released by intel for their specific CPU families (I have an intel x86 CPU so that is what I wish to learn assembly for).
Could someone please explain the above-mentioned troubles. From what I have heard, Intel releases CPU families (x86, for instance) with an instruction set/reference, and the various Assembler programs (MASM, FASM, NASM, etc) provide a higher level human-readable language which is used to make machine code instructions.
Also, from what I heard, when someone says “assembly language”, this actually refers to one of many different styles of assembly languages provided by the many different assemblers out there.
http://en.wikipedia.org/wiki/X86_assembly_language#Examples
MASM style assembly vs NASM style assembly
What I am looking for is “the first” assembler, without the variations that MASM, NASM, etc offer (such as the large libraries of macros). All these assemblers must have come from somewhere, and that is what I am looking for.
Basically, I am looking for the first x86 assembler/assembly language, before MASM, NASM etc. Could someone provide me with a link to this first assembler?
BTW, in case my entire logic about assembly is wrong, could someone clarify!
Thanks in advance,
Prgrmr
To be pedantic, the real language that you would use to talk to a CPU directly is machine code. This would mean figuring out the actual byte values that must be used for certain instructions. This is obviously far too tedious and error prone, so people use an assembler instead. An assembler translates a text representation of the machine code into the machine code itself, and takes care of the various fiddly details like calculating relative addresses etc.
For a particular machine code there can be a number of different assemblers, each with their own idea of how the assembly should be written. This is particularly true of x86 processors – broadly, there are two styles: Intel and AT&T. And then within those, different assemblers can have different sets of macros and directives and so on.
To illustrate, here is a sample of assembly generated from some C code with
gcc -S -masm=intel:And here is the same snippet generated with
gcc -S -masm=att:Those two snippets produce the same machine code – the difference is only in the assembly syntax. Note in particular how the order of arguments is different (Intel is destination-first, AT&T is source-first), the slight differences in instruction names, the use of
%to specify registers in AT&T, and so on.And then there are the different CPUs. A CPU has a certain architecture. That means it will execute the instruction set for that architecture. For that architecture there will be a core instruction set, and possibly extra groups of instructions for enhanced features or special applications. x86 is a fine example – You have the floating point instructions, MMx, 3DNow! and SSE 1 through 5. Different CPUs of that architecture may or may not be able to understand the extra instructions; generally there is some way to ask the CPU what it supports.
When you say “x86 assembly” what people understand you to mean is, “assembly that will run on any CPU of the x86 architecture”.
More sophisticated CPUs – particularly those with memory management (x86 included) do more than simply execute instructions. Starting with the 80286, the x86 architecture has two main modes – real mode and protected mode. The core instruction set can be used as-is in either mode, but the way memory works in each mode is so completely different that it is impractical to try and write real world code that would work in either mode.
Later CPUs introduced more modes. The 386 introduced Virtual 8086 mode aka v86 mode to allow a protected mode operating system to run a real-mode program without having to actually switch the CPU to real mode. AMD64 processors run 64-bit code in long mode.
A CPU can support multiple architectures – the Itanium architecture is considered a separate architecture, and all of the CPUs released by Intel that support Itanium also support x86, with the ability to switch between them.
The x86 family is probably an overly complicated example of an assembly language – it has a terribly long and complex history going back 33+ years. The machine code for the core instructions used in (32-bit) applications is the same as for 8086 released in 1978. It has been through several revisions, each adding more instructions.
If you want to learn x86 assembly properly, consider:
The Art of Assembly Language Programming, and had an edition for each of DOS, Windows and Linux. The Windows and Linux versions use a language invented by the author called High Level Assembly or HLA, which is sort of like x86 assembly but not really. This may or may not be your cup of tea – it’s not strictly real assembly but the concepts are all there, and learning to write proper assembly afterward would not be much effort. To its credit, it also contains a LOT of assembly related material, e.g. info on processor architecture, BIOS, video etc. The DOS version teaches straight MASM (Intel) assembly.
Programming from the Ground Up teaches AT&T style assembly in Linux
For actual assemblers (free ones), try MASM32 (intel style) on windows, or
ason Linux. As it happens, Linuxaswill assemble either Intel or AT&T style assembly.If you feel daunted by the x86 architecture and are willing to learn assembly for some other architecture, consider starting with something smaller.