As a simplified example, imagine that I’m selling widgets. I sell them nationwide (in both the U.S. and Canada) but there are some that can only be sold in certain areas (one or more U.S. states or Canadian provinces).
I’d like a good way to store this information, coupled with a fast way to query for the widgets that are available to a given user. “U.S., 50 states and D.C.” is the most common value, so I’d rather not insert 51 rows.
MySQL doesn’t support bitmap indexes, so that’s ruled out.
Here are some combinations:
- U.S. 50 states and D.C.
- U.S. 50 states, D.C., Canada, but not Quebec.
- U.S. 48 contiguous states and D.C.
- U.S., D.C., but not Colorado
- U.S., D.C., and territories (Puerto Rico, etc).
My user will have given me one value for their state/province and country.
Can you suggest a schema that provides good storage and fast matching?
Thanks!
This is a MySQL SET type, assuming that you can keep your dataset down to 64 items (or, use multiple sets based on other conditions).
I thought I would expand on my answer, because I think some people just don’t understand the power of the set. Example table:
Note that we use a single set field for states. More complex uses will likely require use of multiple sets, but the slightly more horizontal qword for each record may be cheaper than adding a large # of extra join operations on a lookup table that could easily reach a huge # of records on its on.
Below are 3 (functionally) equivalent pulls. Note that the bitmask is very much the fastest way to pull this data:
For test #1, We use 1000 as the bitmask, because this corresponds to item #4 in our list (AZ). This is, by far, the fastest method… and there are few ways to store this data which will give you faster result potential.
This method can use indexes, but will be somewhat slow because of the fuzzy match.
This method will be faster than the fuzzy match, but its nature will pretty much require the use of a temporary table in most real-world uses.