I have a couple of million rows in a postgresql table. I have up to 20 proceeses writing to that table (a few hundred inserts/updates per second) and I have a few processes reading from it at the same time (once a while a big select). This results in many failures (Stream Closed, Input/Ouput Error) on both sides, reading and writing.
I now think about splitting that table into multiple tables. I would split by “type” of object, which is basically a field that has only 20 possible values that are kind of equally distributed.
The question is, should I use multiple tables, multiple schemas or multiple databases to guarantee a non blocking access to the data. Or maybe I should use a completly different setup. Another database maybe? Maybe HTable?
The integrity of the data is not that important. It has to be there in the end but I do not really need an Isolation Level or Transactions. I just need a fast system that can write and read from multiple processes without performance impact and that allows to make queries based on field values.
Right now I use JDBC with Isolation Level TRANSACTION_READ_UNCOMMITTED and a connection per process.
UPDATE:
The schema looks as follows:
CREATE TABLE rev
(
id integer NOT NULL,
source text,
date timestamp with time zone,
title text,
summary text,
md5sum text,
author text,
content text,
CONSTRAINT rev_id_pk PRIMARY KEY (id),
CONSTRAINT md5sum_un UNIQUE (md5sum)
)
CREATE TABLE resp
(
id integer NOT NULL,
source text,
date timestamp with time zone,
title text,
summary text,
md5sum text,
author text,
content text,
CONSTRAINT resp_id_pk PRIMARY KEY (id),
CONSTRAINT md5sum_un UNIQUE (md5sum)
)
And I have a few indexes on some of the fields.
A sample query looks like:
SELECT * FROM rev LEFT JOIN resp ON rev.id = resp.parent_id WHERE rev.date > ? LIMIT 1000 OFFSET ?
The resp table is much smaller, but it too gets updates and is queried in the joins.
What kind of failures? Reading and writing on the same table should not be a problem at all in PostgreSQL, MVCC works fine.
Hard to tell you how to fix your problems without any information about the system and what the processes are doing. Could you tell us more about it? And show a database schema?
READ UNCOMMITTED doesn’t exist in PostgreSQL, it’s treated like Read Committed: