I would like to understand how to use dis (the dissembler of Python bytecode). Specifically, how should one interpret the output of dis.dis (or dis.disassemble)?
.
Here is a very specific example (in Python 2.7.3):
dis.dis("heapq.nsmallest(d,3)")
0 BUILD_SET 24933
3 JUMP_IF_TRUE_OR_POP 11889
6 JUMP_FORWARD 28019 (to 28028)
9 STORE_GLOBAL 27756 (27756)
12 LOAD_NAME 29811 (29811)
15 STORE_SLICE+0
16 LOAD_CONST 13100 (13100)
19 STORE_SLICE+1
I see that JUMP_IF_TRUE_OR_POP etc. are bytecode instructions (although interestingly, BUILD_SET does not appear in this list, though I expect it works as BUILD_TUPLE). I think the numbers on the right-hand-side are memory allocations, and the numbers on the left are goto numbers… I notice they almost increment by 3 each time (but not quite).
If I wrap dis.dis("heapq.nsmallest(d,3)") inside a function:
def f_heapq_nsmallest(d,n):
return heapq.nsmallest(d,n)
dis.dis("f_heapq(d,3)")
0 BUILD_TUPLE 26719
3 LOAD_NAME 28769 (28769)
6 JUMP_ABSOLUTE 25640
9 <44> # what is <44> ?
10 DELETE_SLICE+1
11 STORE_SLICE+1
You are trying to disassemble a string containing source code, but that’s not supported by
dis.disin Python 2. With a string argument, it treats the string as if it contained byte code (see the functiondisassemble_stringindis.py). So you are seeing nonsensical output based on misinterpreting source code as byte code.Things are different in Python 3, where
dis.discompiles a string argument before disassembling it:In Python 2 you need to compile the code yourself before passing it to
dis.dis:What do the numbers mean? The number
1on the far left is the line number in the source code from which this byte code was compiled. The numbers in the column on the left are the offset of the instruction within the bytecode, and the numbers on the right are the opargs. Let’s look at the actual byte code:At offset 0 in the byte code we find
65, the opcode forLOAD_NAME, with the oparg0000; then (at offset 3)6ais the opcodeLOAD_ATTR, with0100the oparg, and so on. Note that the opargs are in little-endian order, so that0100is the number 1. The undocumentedopcodemodule contains tablesopnamegiving you the name for each opcode, andopmapgiving you the opcode for each name:The meaning of the oparg depends on the opcode, and for the full story you need to read the implementation of the CPython virtual machine in
ceval.c. ForLOAD_NAMEandLOAD_ATTRthe oparg is an index into theco_namesproperty of the code object:For
LOAD_CONSTit is an index into theco_constsproperty of the code object:For
CALL_FUNCTION, it is the number of arguments to pass to the function, encoded in 16 bits with the number of ordinary arguments in the low byte, and the number of keyword arguments in the high byte.