I am facing certain problems with strlen right now(there are many cases where I read files and the string is not zero terminated). So I was thinking of making an assembly routine to calculate the length of my strings. What I would do is just go backwards from the end of the string until I encounter my first character and then calculate the length of the string. In fact I already have one that I wrote some time ago when I was writing assembly programs.
Now, I would like to know, is there any reason why I shouldn’t do this? Any particular advantages that I would be losing out on?
Another alternate would be just make each member of my character array to null. I could do this in assembly 4 bytes at a time, or even through a simple for loop.
Keep in mind that I am talking about considerable size arrays[64k]. Considerable in the length that the processing has to be really quick, since I need to display the file as soon as the user selects it.
EDIT:
To clarify, by saying that I know that I know the length of the string, I mean:
char* buffer = new char[length];
I know length. But when I fill this buffer, I do not know the exact length up till which it has ascii characters. When I use strlen, it does not give me the current length. Basically the length can be 500, but there can be only 5 valid characters inside it and the rest 495 could be garbage values.
Yes. If you already have the end byte of the string and its beginning, then you know it’s length:
The +1 is because
endpoints to the last byte of the string. Ifendwere one-past the end, you wouldn’t need the +1. There’s no need for any routine to calculate something you already know.Note that this assumes that the string is ASCII or some other single-byte-per-character encoding. If you’re using a Unicode encoding of some kind (UTF-8, UTF-16, etc), then you’ll have to do scan the string to figure out how many codepoints it is.
Of course, if it is a Unicode encoding, then the question of what you mean exactly by “length” needs to be addressed. The “length” could be “number of codepoints,” “number of distinct graphemes,” or even “number of code units in the encoding.”