I’m having trouble crafting a query to return the correct data, and I’m becoming less confident that it’s even possible with a single query.
I have log records stored in a MySQL database in very much in the same way that printf() works, except that I must keep the format strings stored separately from the replacement values. What I’d like to do is return this data in the most efficient manner possible, given a search for certain values.
Here’s the table setup:
CREATE TABLE `log` (
`log_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`message` varchar(255) NOT NULL,
`num_variables` int(10) unsigned NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`log_id`)
);
CREATE TABLE `variable` (
`log_id` int(10) unsigned NOT NULL,
`order` int(10) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
`value_id` int(10) unsigned NOT NULL,
KEY `log_id` (`log_id`),
KEY `value_id` (`value_id`)
);
CREATE TABLE `value` (
`value_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`value` varchar(255) NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`value_id`),
UNIQUE KEY `value` (`value`)
);
And here’s an example usage:
log('user %email% invited %num% new players', 'him@example.com', 2);
which would lead to the following queries:
-- create the log record (resulting PK would be 1)
INSERT INTO log
(message, num_variables)
VALUES
('user %email% invited %num% new players', 'him@example.com', '2');
-- create the first value record (resulting PK would be 1)
INSERT INTO value
(value)
VALUES
('him@example.com');
-- create the first variable record (resulting PK would be 1)
INSERT INTO variable
(log_id, order, name, value_id)
VALUES
(1, 0, 'email', 1);
-- create the second value record (resulting PK would be 2)
INSERT INTO value
(value)
VALUES
('2');
-- create the second variable record (resulting PK would be 2)
INSERT INTO variable
(log_id, order, name, value_id)
VALUES
(1, 1, 'num', 2);
Now I want to be able to pull log records back out of the database, with their associated variables and values. Specifically, I need the log message, and all it’s associated values:
SELECT log.id, log.message
variable.order, variable.name
value.value_id, value.value
FROM log
LEFT JOIN variable ON (log.log_id = variable.log_id)
LEFT JOIN value ON (variable.value_id = value.value_id)
This works fine if I want ALL log records (ignoring the fact that log.log_id and log.message are returned redundantly for any logs with multiple variables). But I want more specificity.
To borrow from the example above, I want to be able to specify that I only want log records containing an “email” of “him@example.com”, let’s say. When I add that into my query…
SELECT log.log_id, log.message
variable.order, variable.name
value.value_id, value.value
FROM log
LEFT JOIN variable ON (log.log_id = variable.log_id)
LEFT JOIN value ON (variable.value_id = value.value_id)
WHERE (variable.name = 'email' AND value.value = 'him@example.com')
It will return that log/variable/value record, but it will NOT return the associated “num = 2” record (which is required to fully reconstruct the log). Additionally, suppose I wanted to specify a second constraint, say, where “action” = “logged out”. I could (incorrectly) alter my WHERE clause to look like this:
-- won't return anything
WHERE (variable.name = 'email' AND value.value = 'him@example.com')
AND (variable.name = 'action' AND value.value = 'logged out')
or this:
-- will also return logs containing only ONE of the given constraints
WHERE (variable.name = 'email' AND value.value = 'him@example.com')
OR (variable.name = 'action' AND value.value = 'logged out')
but in either case, you can see that it misses the mark, and doesn’t return the exact result set I’m looking for.
Are my tables are poorly (or under- or over-) designed? Am I approaching the query the wrong way? Would storing a field of derived data somewhere give me what I need? Is there some JOIN I’ve failed to use that would solve the problem?
UPDATE 1:
variable.order and variable.name are just two different methods for assuring that the values are interpolated back into log.message correctly.
UPDATE 2:
Based on comments, it’s worth noting that these tables are a contrived example to simplify the post – the actual table structure is slightly more complex than presented. I’ve merely reduced that complexity down to the very kernel of the issue. Simple use-a-single-table-and-serialize-the-value techniques won’t work for me. Aside from that, we need to be able to lookup these logs based on values pretty quickly, and such a solution wouldn’t provide us the proper indexing capabilities.
How about:
Without knowing a bigger sample of data, I can’t really comment on the table design. At this point I do question separating out the variables and values tables unless this is a one-to-many relationship variables->values.