See also C Tokenizer
Here is a quick substr() for C that I wrote (yes, the variable initializations needs to be moved to start of the function etc, but you get the idea)
I have seen many “smart” implementations of substr() that are simple one liner calls strncpy()!
They are all wrong (strncpy does not guarantee null termination and thus the call might NOT produce a correct substring!)
Here is something maybe better?
Bring out the bugs!
char* substr(const char* text, int nStartingPos, int nRun)
{
char* emptyString = strdup(""); /* C'mon! This cannot fail */
if(text == NULL) return emptyString;
int textLen = strlen(text);
--nStartingPos;
if((nStartingPos < 0) || (nRun <= 0) || (textLen == 0) || (textLen < nStartingPos)) return emptyString;
char* returnString = (char *)calloc((1 + nRun), sizeof(char));
if(returnString == NULL) return emptyString;
strncat(returnString, (nStartingPos + text), nRun);
/* We do not need emptyString anymore from this point onwards */
free(emptyString);
emptyString = NULL;
return returnString;
}
int main()
{
const char *text = "-2--4--6-7-8-9-10-11-";
char *p = substr(text, -1, 2);
printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 1, 2);
printf("[*]'%s' (-2)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 3, 2);
printf("[*]'%s' (--)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 16, 2);
printf("[*]'%s' (10)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 16, 20);
printf("[*]'%s' (10-11-)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 100, 2);
printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 1, 0);
printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
free(p);
return 0;
}
Output :
[*]'' (")
[*]'-2' (-2)
[*]'--' (--)
[*]'10' (10)
[*]'10-11-' (10-11-)
[*]'' (")
[*]'' (")
I would say return
NULLif the input isn’t valid rather than amalloc()ed empty string. That way you can test whether or not the function failed or not withif(p)rather thanif(*p == 0).Also, I think your function leaks memory because
emptyStringis onlyfree()d in one conditional. You should make sure youfree()it unconditionally, i.e. right before thereturn.As to your comment on
strncpy()not NUL-terminating the string (which is true), if you usecalloc()to allocate the string rather thanmalloc(), this won’t be a problem if you allocate one byte more than you copy, sincecalloc()automatically sets all values (including, in this case, the end) to 0.I would give you more notes but I hate reading camelCase code. Not that there’s anything wrong with it.
EDIT: With regards to your updates:
Be aware that the C standard defines
sizeof(char)to be 1 regardless of your system. If you’re using a computer that uses 9 bits in a byte (god forbid),sizeof(char)is still going to be 1. Not that there’s anything wrong with sayingsizeof(char)– it clearly shows your intention and provides symmetry with calls tocalloc()ormalloc()for other types. Butsizeof(int)is actually useful (ints can be different sizes on 16- and 32- and these newfangled 64-bit computers). The more you know.I’d also like to reiterate that consistency with most other C code is to return
NULLon an error rather than"". I know many functions (likestrcmp()) will probably do bad things if you pass them NULL – this is to be expected. But the C standard library (and many other C APIs) take the approach of “It’s the caller’s responsibility to check forNULL, not the function’s responsibility to baby him/her if (s)he doesn’t.” If you want to do it the other way, that’s cool, but it’s going against one of the stronger trends in C interface design.Also, I would use
strncpy()(ormemcpy()) rather thanstrncat(). Usingstrncat()(andstrcat()) obscures your intent – it makes someone looking at your code think you want to add to the end of the string (which you do, because aftercalloc(), the end is the beginning), when what you want to do is set the string.strncat()makes it look like you’re adding to a string, whilestrcpy()(or another copy routine) would make it look more like what your intent is. The following three lines all do the same thing in this context – pick whichever one you think looks nicest:Plus,
strncpy()andmemcpy()will probably be a (wee little) bit faster/more efficient thanstrncat().text + nStartingPosis the same asnStartingPos + text– I would put thechar *first, as I think that’s clearer, but whatever order you want to put them in is up to you. Also, the parenthesis around them are unnecessary (but nice), since+has higher precedence than,.EDIT 2: The three lines of code don’t do the same thing, but in this context they will all produce the same result. Thanks for catching me on that.