I have some C code to parse a text file, first line by line and then into tokens
This is the function that parses it line by line:
int parseFile(char *filename) {
//Open file
FILE *file = fopen(filename, "r");
//Line, max is 200 chars
int pos = 0;
while (!feof(file)) {
char *line = (char*) malloc(200*sizeof(char));
//Get line
line = fgets(line, 200, file);
line = removeNewLine(line);
//Parse line into instruction
Instruction *instr = malloc(sizeof(instr));
instr = parseInstruction(line, instr);
//Print for clarification
printf("%i: Instr is %s arg1 is %s arg2 is %s\n",
pos,
instr->instr,
instr->arg1,
instr->arg2);
//Add to end of instruction list
addInstruction(instr, pos);
pos++;
//Free line
free(line);
}
return 0;
}
And this is the function that parses each line into some tokens and eventually puts it into an Instruction struct:
Instruction *parseInstruction(char line[], Instruction *instr) {
//Parse instruction and 2 arguments
char *tok = (char*) malloc(sizeof(tok));
tok = strtok(line, " ");
printf("Line at %i tok at %i\n", (int) line, (int) tok);
instr->instr = tok;
tok = strtok(NULL, " ");
if (tok) {
instr->arg1 = tok;
tok = strtok(NULL, " ");
if(tok) {
instr->arg2 = tok;
}
}
return instr;
}
the line printf("Line at %i tok at %i\n", (int) line, (int) tok); in ParseInstruction always prints the same two values, why are these pointer addresses never changing? I have confirmed that parseInstruction returns a unique pointer value each time, but each instruction has the same pointer in it’s instr slot.
Just for clarity, Instruction is defined like this:
typedef struct Instruction {
char *instr;
char *arg1;
char *arg2;
} Instruction;
What am I doing wrong?
That is how
strtokworks: it actually modifies the string that it’s operating on, replacing separator-characters with'\0'and returning pointers into that string. (See the “BUGS” section in thestrtok(3)manual page, though it’s not really a bug, just a behavior-that-people-don’t-usually-expect.) So your initialtokwill always point to the first character ofline.By the way, this:
first sets
tokto point at the return-value ofmalloc, then re-assigns it to point at the return-value ofstrtok, thereby completely discarding the return-value ofmalloc. It’s just like how writing this:completely discards the return-value of
some_function(); except that it’s even worse, because discarding the return-value ofmallocresults in a memory leak.