I’m trying to come up with a PostgreSQL schema for host data that’s currently in an LDAP store. Part of that data is the list of hostnames a machine can have, and that attribute is generally the key that most people use to find the host records.
One thing I’d like to get out of moving this data to an RDBMS is the ability to set a uniqueness constraint on the hostname column so that duplicate hostnames can’t be assigned. This would be easy if hosts could only have one name, but since they can have more than one it’s more complicated.
I realize that the fully-normalized way to do this would be to have a hostnames table with a foreign key pointing back to the hosts table, but I’d like to avoid having everybody need to do joins for even the simplest query:
select hostnames.name,hosts.*
from hostnames,hosts
where hostnames.name = 'foobar'
and hostnames.host_id = hosts.id;
I figured using PostgreSQL arrays could work for this, and they certainly make the simple queries simple:
select * from hosts where names @> '{foobar}';
When I set a uniqueness constraint on the hostnames attribute, though, it of course treats the entire list of names as the unique value instead of each name. Is there a way to make each name unique across every row instead?
If not, does anyone know of another data-modeling approach that would make more sense?
The righteous path
You might want to reconsider normalizing your schema. It is not necessary for everyone to "join for even the simplest query". Create a
VIEWfor that.Table could look like this:
The surrogate primary key
hostname_idis optional. I prefer to have one. In your casehostnamecould be the primary key. But many operations are faster with a simple, smallintegerkey. Create a foreign key constraint to link to the tablehost.Create a view like this:
Starting with pg 9.1, the primary key in the
GROUP BYcovers all columns of that table in theSELECTlist. The release notes for version 9.1:Queries can use the view like a table. Searching for a hostname will be much faster this way:
Provided you have an index on
host(host_id), which should be the case as it should be the primary key. Plus, theUNIQUEconstraint onhostname(hostname)implements the other needed index automatically.In Postgres 9.2+ a multicolumn index would be even better if you can get an index-only scan out of it:
Starting with Postgres 9.3, you could use a
MATERIALIZED VIEW, circumstances permitting. Especially if you read much more often than you write to the table.The dark side (what you actually asked)
If I can’t convince you of the righteous path, here is some assistance for the dark side:
Here is a demo how to enforce uniqueness of hostnames. I use a table
hostnameto collect hostnames and a trigger on the tablehostto keep it up to date. Unique violations raise an exception and abort the operation.Trigger function:
Trigger:
SQL Fiddle with test run.
Use a GIN index on the array column
host.hostnamesand array operators to work with it: