Question
Which of these is the quickest?
I’m using lighttpd’s mod_fastcgi, Python 2.7 and flup.server.fcgi.WSGIServer.
Should I yield strings directly in some_output_function, then return from app?
def app(env, start):
start('200 OK', [('Content-Type', 'text/html')])
return some_output_function()
def some_output_function():
yield function_that_returns_a_string()
yield 'yada yada'
yield another_function_that_returns_a_string()
WSGIServer(app).run()
Should I return an array from some_output_function, then return from app?
def app(env, start):
start('200 OK', [('Content-Type', 'text/html')])
return some_output_function()
def some_output_function():
out = []
out.append(function_that_returns_a_string())
out.append('yada yada')
out.append(another_function_that_returns_a_string())
return out
WSGIServer(app).run()
Should I yield a last-minute joined array from some_output_function, then return from app?
def app(env, start):
start('200 OK', [('Content-Type', 'text/html')])
return some_output_function()
def some_output_function():
out = []
out.append(function_that_returns_a_string())
out.append('yada yada')
out.append(another_function_that_returns_a_string())
yield ''.join(out)
WSGIServer(app).run()
Should I return a last-minute joined array from some_output_function, then yield from app?
def app(env, start):
start('200 OK', [('Content-Type', 'text/html')])
yield some_output_function()
def some_output_function():
out = []
out.append(function_that_returns_a_string())
out.append('yada yada')
out.append(another_function_that_returns_a_string())
return ''.join(out)
WSGIServer(app).run()
Test results
By creating a simple test application, with the output function having one function call, then sixteen ‘yada yada’ strings, then another function call as the output, I gathered some surprising average request times, using ApacheBench.
sudo ab -n10000 -c128 localhost/testapp/
- 44 ms to yield strings directly in
some_output_function, then return fromapp - 44 ms to return an array from
some_output_function, then return fromapp - 30 ms to yield a last-minute joined array from
some_output_function, then return fromapp - 30 ms to return a last-minute joined array from
some_output_function, then yield fromapp
Even more interesting, is that when increasing the number of ‘yada yada’ output strings eight-fold, to 128 ‘yada yada’ output strings, these are the results:
- 146 ms to yield strings directly in
some_output_function, then return fromapp - 146 ms to return an array from
some_output_function, then return fromapp - 30 ms to yield a last-minute joined array from
some_output_function, then return fromapp - 30 ms to return a last-minute joined array from
some_output_function, then yield fromapp
It appears that a common factor to save time is building a string array, then joining it just before exiting the inner output function, instead of yielding everywhere. Whether you yield inside and return outside, or return inside and yield inside, doesn’t appear to change anything.
So the only question now, really, is, should I yield inside or outside?
As a general rule generators are more efficient than lists when dealing with a lot of data. A list will be less overhead if the number of elements is small (e.g. in your example, only three elements).
Whichever method you choose, it will most likely be dwarfed by the time spent fetching data from the cache or the datastore (dozens to hundreds of milliseconds). Shaving 10ms of response time is probably not worth worrying about.
The reason why generators should be used is not for speed, but because large responses will be streamed to the client, which will use less memory and free up the server to process more requests. This is especially beneficial when done with an async server (e.g. gunicorn with eventlet workers, Tornado, etc.).
To answer this question:
Practically it should not make any difference.