I recently designed an H.323/SIP compliant video server (in code at least) fully equipped with a sockets based API which a .NET SDK would use, and a web server, you know … all of that stuff. Anyway, I chose to use OPAL for my call stack and based my architecture loosely upon the design of EKIGA. I even hijacked the serial ports for digital I/O with two outputs and three inputs.
Everything works great from my Linux P.C.. I built my own Linux distribution specifically for the new boards with the Intel Atom processors with 2 GB of RAM. The problem? The Atom processors can’t handle the load of the encoders. The maximum frame rate I ever pull is about 7 FPS on NTSC. It does this regardless of bitrate. I know I don’t have any memory leaks, however the CPU load rises to about 130% between two cores so really about 66% total. I really don’t want to have to change stacks, but I don’t know what I need to do. Are there some lighter weight encoders I can convert into PWLIB plugins?
The problem happens regardless of video encoder, H.261, theora, H.263+, etc… What should be my next plan of attack?
Update:
OK, so I think my next move is going to be to find a very low profile PCIe GPU that is OpenGL compatible; it needs to lay parallel to the motherboard. How can I do that? Also, am I barking up the wrong tree? I am just a programmer, so please pardon my ignorance.
Additional question:
Assuming that I get another board with a GPU. How do I make sure that the encoding is done on the GPU and not on the CPU? Is this managed by the OS and the driver? Do I need to write special code to do so? Also, it seems to me that the main function of the GPU is in rendering and output, does it also manage actual transforms and encoding? A good book recommendation would be nice.
More information:
I suspect now that the GPU is not the problem. I think it may have something to do with the temporal spatial tradeoff. I mounted the flash on my overclocked i7 950 and had the same exact problem. I discovered that the framerate drops on motion, but that if there is no motion then I can keep a high framerate. I also talked to one of the architects at OPAL VoIP and they also doubt that the GPU is the problem. What else could the problem be?
This is the time for microptimisation: Time to look carefully at your inner loops.
You need to figure out which inner loops matter, and then look carefully at how you can get the most throughput. You can also do a sanity check: Can the machine really do what you want to do? Eg. if you need to do n multiple/accumulate operations and you have n/3 cycles, there is a basic problem and you need to do something else.