Please refers to the edit portion for my explanation.
This is a bit long and difficult to illustrate. But I appreciate taking your time to read this. Please bear with me.
Suppose I have this:
.data
str1: .asciiz "A"
str2: .asciiz "1"
myInt:
.word 42 # allocate an integer word: 42
myChar:
.word 'Q' # allocate a char word
.text
.align 2
.globl main
main:
lw $t0, myInt # load myInt into register $t0
lw $t3, str1 # load str1 into register $t3
lw $t4, str2 #load str2 into register $t4
la $a0, str1 # load address str1
la $a1, str2 # load address str2
Then in SPIM, the user text segment is
User Text Segment [00400000]..[00440000]
[00400000] 8fa40000 lw $4, 0($29) ; 183: lw $a0 0($sp) # argc
[00400004] 27a50004 addiu $5, $29, 4 ; 184: addiu $a1 $sp 4 # argv
[00400008] 24a60004 addiu $6, $5, 4 ; 185: addiu $a2 $a1 4 # envp
[0040000c] 00041080 sll $2, $4, 2 ; 186: sll $v0 $a0 2
[00400010] 00c23021 addu $6, $6, $2 ; 187: addu $a2 $a2 $v0
[00400014] 0c100009 jal 0x00400024 [main] ; 188: jal main
[00400018] 00000000 nop ; 189: nop
[0040001c] 3402000a ori $2, $0, 10 ; 191: li $v0 10
[00400020] 0000000c syscall ; 192: syscall # syscall 10 (exit)
[00400024] 3c011001 lui $1, 4097 ; 23: lw $t0, myInt # load myInt into register $t0
[00400028] 8c280004 lw $8, 4($1)
[0040002c] 3c011001 lui $1, 4097 ; 25: lw $t3, str1 # load str1 into register $t3
[00400030] 8c2b0000 lw $11, 0($1)
[00400034] 3c011001 lui $1, 4097 ; 27: lw $t4, str2 #load str2 into register $t4
[00400038] 8c2c0002 lw $12, 2($1)
[0040003c] 3c041001 lui $4, 4097 [str1] ; 29: la $a0, str1 # load address str1
[00400040] 3c011001 lui $1, 4097 [str2] ; 31: la $a1, str2 # load address str2
[00400044] 34250002 ori $5, $1, 2 [str2]
I understand that lw is a pseudocode so it needs to be broken down into two instructions. I understand this part. We use the entry address of data segment as a “base pointer” and relatively accessing other data (including the first data).
I also observed that loading address of str1 and str2 used two different registers: $4 and $1. $4 is $a0.
Why is that?
If I swap the last two instructions, on SPIM I see
...
[0040003c] 3c011001 lui $1, 4097 [str2] ; 31: la $a1, str2 # load address str2
[00400040] 34250002 ori $5, $1, 2 [str2]
[00400044] 3c041001 lui $4, 4097 [str1] ; 32: la $a0, str1 # load address str1
So why is load address so strange? Why did str2 use $1 ???
How can I go about explaining how lui $1, 4097 [str2] and lui $4, 4097 [str1] are different?
PS: Can someone also explain to me why we need the bracket [str2] ?
lui, $1, 4097, [str2] only pushes the entry address of data segment to register $1. That is, 0x10010000 .
Thank you very much!
EDIT
I rewrote the entire script to simplify the situation.
Script: http://pastebin.com/BHh89iqt
Text Segment: http://pastebin.com/t2eDEs1f
Let me remind you that we write in pseudo instructions, rather than true MIPS machine code. That is, “lw”, “jal”, “addi”, etc are all pseudo instructions.
For example, lw (load word) is broken down into two machine instructions (look at the text segement):
lui $1, 4097 ; 23: lw $t0, myInt # load myInt into register $t0
lw $8, 4($1)
MIPS is 32-bit, we therefore break it down into two instructions. The total of addressing a 32-bit address will result in 43 bits instruction set.. this is why we break down into 2 parts.
A label is a memory address pointing at the thing we assigned to.
MIPS can only read instructions of the form lw $rt, offset($rs). So most of the load instructions follow this approach and use $at to convert pseudoinstructions that involve labels to MIPS machine instructions.
For lw it’s pretty easy. For la load address it’s a bit tricky.
Pay attention to the last four load address instructions. MIPS translates them into this:
[0040003c] 3c041001 lui $4, 4097 [str1] ; 27: la $a0, str1 # load address str2
[00400040] 3c011001 lui $1, 4097 [str2] ; 28: la $a0, str2 # load address str1
[00400044] 34240002 ori $4, $1, 2 [str2]
[00400048] 3c011001 lui $1, 4097 [str2] ; 30: la $a0, str2 # load address str2
[0040004c] 34240002 ori $4, $1, 2 [str2]
[00400050] 3c041001 lui $4, 4097 [str1] ; 31: la $a0, str1 # load address str1
$4 refers to $a0. If you look at the instructions, I swapped the first two load instructions and the result is the last two instructions.
I purposely did this to illustrate the strange behavior: before swapping, lui uses $4 to store the address of str1, but if I want to load the address of str2, I will use $at and then apply offset.
I couldn’t figure out why last night, and just now, I realized that this is done because the compiler is smart enough to know that str1 is the first data in the data segement, so there is no need to convert anything.
Yet this is also strange because how does the compiler know at what byte to stop printing the string? (if we want to print a string…)
My guess: Null character to terminate print.
Anyhow. I guess this is just a convention that the MIPS uses.
Second Edit
In fact if you just add a new data on top of str1, you will see that
my explanation is correct.
New script: http://pastebin.com/8DuzFrk0
New Text Segment: http://pastebin.com/YXbvzc4z
I only added myCharB to the top of the data segment.
[0040003c] 3c011001 lui $1, 4097 [str1] ; 29: la $a0, str1 #
load address str2
[00400040] 34240004 ori $4, $1, 4 [str1]
[00400044] 3c011001 lui $1, 4097 [str2] ; 30: la $a0, str2 #
load address str1
[00400048] 34240006 ori $4, $1, 6 [str2]
Well, who cares? xD It’s internal SPIM implementation and it’s free to use any register as long as it does not break MIPS ABI. I just suggest you not relying too much on pseudo-instructions to make sure what registers have changed/what values they hold. Also usually LW is not a pseudo-instruction, but in the way you’re using it is.
You don’t need any brackets. That’s just a SPIM information to the programmer to show this instruction is loading the str2 address. It’s not part of the assembly.
Well actually it only load upper half-word of $1. It just happens that the lower half-word is plain zeroes. Keep in mind LUI does not modify lower half-word, so you have to make sure it holds the value you want (reseting register or using LI).
Null-terminated, as you guessed right.
This is way older than MIPS. And MIPS doesn’t define anything about this, either does any other architecture. This is data handling and it’s defined on a upper layer like OS. In this case it is SPIM convention on its own syscalls. Anyway null-terminated strings are pretty common. C programming language uses so for strings.