I’m curious about clever way to implement printf() analogue in dynamic language. The problem is that arguments list can contain deeply nested datatypes, so I can’t easily know how much memory should I allocate for final buffer. The obvious way to do this is to make 2 passes through arguments: one to estimate buffer size and other to actually format string. Is there any better way to do this?
CLARIFICATION: I’m thinking about writing C-function for Erlang. Erlangs’ datatypes are deeply boxed, so to use asprintf-like functions I will need to unbox them all (and possibly rewrite formatstring), and that is expensive.
If it’s
printfthat you’re trying to emulate, then you don’t have a problem, because you don’t need a buffer, you write each token to the console as you find it.If you’re trying to emulate
sprintfthen you need to update your question.For sprintf . . . use an extensible string buffer.
If you’ve to roll your own, start off with a reasonable buffer of 512 bytes. As you reach the limit of this, allocate another buffer of twice the previous limit (so 1024 first time, 2048 second time etc), copy buffer1 to buffer2, swap your new buffer for the old and throw away/free/delete/deallocate the first buffer.
Then when you’re finished, you allocate a string of the correct length, copy your buffer to the string and return it.
The last step can be ignored if you don’t mind passing back the buffer as the result, even though it is technically too large and is probably mostly unused.
Update
Feels like sub-optimal solution because of reallocation. Am I wrong?
In a word, yes.
This is how dynamic lists & arrays are implemented in major frameworks like the C++ STL and the .Net framework. If you consider the likelihood that a format might break 512bytes, how likely is it to break 1024, or 2048? that’s three extra copies, if the string ends up being that long. You can probably apply an 80/20 rule that 80% of the time you’ll never hit the first 512 limit (you could probably drop the first allocation to 64 bytes and still apply the 80/20 rule)
Now consider your alternative, to make two passes over the items to be formatted.
If you have a 32bit int, you’ve to pretty much convert that to a string to find out how long the string is going to be. You’ll do that one extra time for every item in the list, which is an allocation for a buffer to do the conversion, the time to do the conversion, and then de-allocate the string. Getting the length of an int is relatively straightforward compared to some other data types.
Also consider complex objects, if you’re getting the length of those, their representations are (possibly) build up by calling some
.ToStringlike method, which will concatenate the results of all it’s sub objectsToStringmethods together, and again you’ll be doing this twice.Given the toss between an extensible string buffer, and building all the strings one extra time to get their lengths? I’d go with for the buffer every time.