In AMD 10h architecture like Opteron the prefetched instructions after being aligned are separated into 2 flows: DirectPath (or Fastpath) and VectorPath (Microcode engine). Later these flows are ready for integer or floating-point execution paths.
What is the method the fetched instructions are marked for either flow? Is there a flag bit or some sort?
The AMD documentation is very vague about the differentiation mechanizm. The only mentioned is:
When the target 32-byte instruction window is
obtained from the L1 instruction cache, the instruction bytes are examined to determine whether the
type of basic decode to take place is DirectPath or VectorPath.
There is no “flag” that one can query from software. The decode logic might have a table or logic that notifies later stages of the pipeline in the silicon, but it is invisble from software.
You could go to the SWOG (Software Optimization Guide) and see the latency tables in the appendices: http://developer.amd.com/documentation/guides/Pages/default.aspx
Or you could go to the compiler source (gcc) and glean it from the athlon/fam10h model: http://gcc.gnu.org/viewcvs/trunk/gcc/config/i386/athlon.md?revision=174172&content-type=text%2Fplain&view=co
or the bdver1 model: http://gcc.gnu.org/viewcvs/trunk/gcc/config/i386/bdver1.md?revision=165853&content-type=text%2Fplain&view=co
—
Quentin