My web application need to support scalability at both Web and DB tier.
I have the following components:
- N Web servers (Tomcat)
- M DB servers sharded by username as the shard key (PostgreSQL)
Our shard strategy as following:
-
The shard strategy is lookup table based, we have one index table (username,shardId), the other shard table (shardId,connectionString,loading).
-
We will monitor the shard DB periodically, and update the loading status field.
-
When create new user, we always pickup the shard with lowest loading, and store to index table.
-
The DB shards will be added or removed dynamically.
I have to implement an API like getDBConnection(username) to get one JDBC connection according to the shard key (in this case the login user).
The questions are :
1.How could I implement this API in a way that will work with a connection pool? Suppose each shard support 500 connections, how could i did this via Java code?
I probably wouldn’t connection pool in the application at all, I’d probably just use server-side connection pooling with
pgpool-IIorpgbouncer.If I was to use in-application pooling I’d create per-shard connection pools and just pick a connection out of the appropriate pool for the shard. This will work with any connection pool implementation that lets you create pools programmatically, rather than declaratively. Each pool should try to close inactive connections quite aggressively. It looks like
org.apache.tomcat.jdbc.pool.DataSourceis suitable for this; see the Tomcat JDBC pool docs.Since this will result in potentially large numbers of connections sitting around and lots of connects/disconnects, it will be very important to also run a server side connection pool to limit and share connections to each shard. Use something like
pgbouncerorpgpool-IIin transaction pooling mode on each shard to share a relatively small number of real PostgreSQL connections out among the larger number of connections from web workers.I’d say you’ll want a pattern like this:
where each app pool connects to a pgbouncer on the shard server corresponding to that pool, and only that shard server’s pgbouncer ever talks to the shard’s PostgreSQL instance.