I’ve been working on improving my OpenGL ES 2.0 render performance by introducing batching; specifically one creates a RenderBatch, specifying a texture and a shader (for now) upon creation. This sets the state into a VAO to allow for inexpensive state switching. I started the implementation looking something like this:
batch = RenderBatch.new "SpriteSheet" "FlatShader"
batch.begin GL_TRIANGLE_STRIP
batch.addGeometry Geometry.newFromFile "Billboard"
batch.end
batch.render renderEngine
But then it hit me: my Billboard file has vertices that are meant to be scaled and translated for specific instance usage. So I added a transform argument to the addGeometry call.
batch.addGeometry(Geometry.newFromFile("Billboard"), myObject.transform)
This solves the problem of scaling, translating, and rotating the vertices, but it does so by first looking up the vertex information, transforming it by the transform matrix, and then inserts it into the batch data. While this works it seems inefficient; it is CPU intensive and doesn’t take advantage of the GPU’s transformation power. However, it works, so not that big of a deal. (Would be nice to have a better way to do this though)
However, I’ve run into a roadblock: texture coordinates may need to be different for each instance as well, and that means I would have to pass in a texture transformation matrix, and now this is feeling hacky.
Is there an easier way to handle this kind of transformation to existing data using shaders that does not limit the geometry/models given and is easily extensible to use normal maps, UV maps, and other fancy tricks? Thanks!
It seems to me that what you are talking about are shader
uniforms. Normally you would set up the vertex data and attributes for each batch in a VBO and a VAO. Then, in your render method, you switch to the correct VAO and set up the shader uniforms. These normally include a model-view-projection matrix to transform vertices into clip space, which necessarily would change nearly every frame, the correct texture to use, etc.This is efficient because the unchanging vertex data is held in GPU memory, the VAO takes care of cheap state switching, and only the uniforms, which generally change often, are sent to the GPU each render call.
If you are batching multiple objects that require separate model view projection matrices, then you have a few options:
you have to perform a separate draw call for each batch that requires a separate model view projection matrix
use an array of model view projection matrices as a uniform and have an attribute for each object that provides the correct projection matrix index to use
you have to transform the vertices using the CPU and refill the VBO with the updated data
The first method is the preferred solution, it will be efficient and simple. The slow part of rendering lots of draw calls is generally getting the data from the CPU to the GPU, if you already have the vertex data in VBOs then the overhead of a draw call per object is not going to be a big deal. This also solves the problem of how to provide different uniforms per object based on object properties. In each objects render method, the relevant properties are set up as uniforms before the draw call is made. If each object requires different data sent to the GPU, then how else could this work?