I have several different tables in my database and I’m trying to use Sphinx to do fast full-text searches. For ease of discussion, let’s say the main records of interest are packing slips, one of which is included when an order ships. How do I use Sphinx to execute complex queries across all of these tables without completely denormalizing the database?
Each packing slip lists the order number, shipper, recipient, and the tracking number of each box included with the shipment. A separate table contains information about the order items. An additional table contains the customer address information. So, orders contain boxes and boxes contain items. (Example schema listed at the bottom of this question).
I would like to be able to query Sphinx to answers to questions like:
- How many people who live on a street named “Maple” ordered an item with “large” in the description?
- Which orders contain include the word “blue” in either the box description or order items’ description?
To answer these types of questions, I need to refer to several tables. Since Sphinx doesn’t have JOINs, one option is to denormalize the database. Denormalizing using a view, so that each row represents an order item–plus all of the data of it’s parent box and order, would result in billions of very wide rows. So I’ve been creating a separate index for each table instead. But that doesn’t allow me to query across tables as a SQL JOIN would. Is there another solution?
Example database
CREATE TABLE orders (
id integer PRIMARY KEY,
date_ordered date,
customer_po varchar
);
INSERT INTO orders VALUES (1, '2012-12-13', NULL);
INSERT INTO orders VALUES (2, '2012-12-14', 'DF312442');
CREATE TABLE parties (
id integer PRIMARY KEY,
order_id integer NOT NULL REFERENCES orders(id),
party_type varchar,
company varchar,
city varchar,
state char(2)
);
INSERT INTO parties VALUES (1, 1, 'shipper', 'ACME, Inc.', 'New York', 'NY');
INSERT INTO parties VALUES (2, 1, 'recipient', 'Wylie Coyote Corp.', 'Flagstaff', 'AZ');
INSERT INTO parties VALUES (3, 2, 'shipper', 'Cyberdyne', 'Las Vegas', 'NV');
-- Please disregard the fact that this design permits multiple shippers and multiple recipients
-- per order. This is a vastly simplified version of the system I'm working on.
CREATE TABLE boxes (
id integer PRIMARY KEY,
order_id integer NOT NULL REFERENCES orders(id),
tracking_num varchar NOT NULL,
description varchar NOT NULL,
);
INSERT INTO boxes VALUES (1, 1, '1234567890', 'household goods');
INSERT INTO boxes VALUES (2, 1, '0987654321', 'kitchen appliances');
INSERT INTO boxes VALUES (3, 2, 'ABCDE12345', 'audio equipment');
CREATE TABLE box_contents (
id integer PRIMARY KEY,
order_id integer NOT NULL REFERENCES orders(id),
box integer NOT NULL REFERENCES boxes(id),
qty_units integer,
description varchar
);
INSERT INTO box_contents VALUES (1, 1, 1, 4, 'cookbook');
INSERT INTO box_contents VALUES (2, 1, 1, 2, 'baby bottle');
INSERT INTO box_contents VALUES (3, 1, 2, 1, 'television');
INSERT INTO box_contents VALUES (4, 2, 3, 2, 'lamp');
You put the JOIN in the sql_query that builds the index. The tables remain normalized, but you denormalize when building the index.
Its only a basic example, but your query would be something like.. .
Update: alternatively can use sql_joined_field to do the same but avoid actual sql_query joins. Sphinx then does the join process for you