My goal is to achieve the highest performance available for copying a block of data from the database into a C-function to be processed and returned as the result of a query.
I am new to PostgreSQL and I am currently researching possible ways to move the data. Specifically, I am looking for nuances or keywords related specifically to PostgreSQL to move big data fast.
NOTE:
My ultimate goal is speed, so I am willing to accept answers outside of the exact question I have posed as long as it gets big performance results. For example, I have come across the COPY keyword (PostgreSQL only), which moves data from tables to files quickly; and vice versa. I am trying to stay away from processing that is external to the database, but if it provides a performance improvement that out-weighs the obvious drawback of external processing, then so be it.
It sounds like you probably want to use the server programming interface (SPI) to implement a stored procedure as a C language function running inside the PostgreSQL back-end.
Use
SPI_connectto set up the SPI.Now
SPI_prepare_cursora query, thenSPI_cursor_openit.SPI_cursor_fetchrows from it andSPI_cursor_closeit when done. Note thatSPI_cursor_fetchallows you to fetch batches of rows.SPI_finishto clean up when done.You can return the result rows into a tuplestore as you generate them, avoiding the need to build the whole table in memory. See examples in any of the set-returning functions in the PostgreSQL source code. You might also want to look at the
SPI_returntuplehelper function.See also: C language functions and extending SQL.
If maximum speed is of interest, your client may want to use the libpq binary protocol via libpqtypes so it receives the data produced by your server-side SPI-using procedure with minimal overhead.