INFORMIX-SQL 4.10.DC1 (SE Engine), on DOS 6.22, under Microsoft Virtual PC 2007, in Windows 7!
EDIT: Looking for Jonathan Leffler's wisdom on this one!
I have a child table called transaction and a parent table called customer.
These tables are joined by customer.pk_id SERIAL = transaction.fk_id INTEGER.
The transaction table has a clustered index on fk_id so that each customers transactions are physically grouped together with the data matching the index. The reason I chose the cluster index on fk_id is because most queries are done on customer name and this quickly returns all of the transactions belonging to the queried customer.
The transaction table also has a column called transaction_number SERIAL with a unique index.
When a new transaction is added for any customer, a new row is added at EOF with the highest transaction_number.. Note: Informix does not maintain the physical clustering in the datafile after the index is created.
So when I want to print a customer receipt of the last transaction entered I do the following query:
SELECT *
FROM customer c,
transaction t
WHERE c.pk_id = t.fk_id
AND t.trx_num = (SELECT MAX(trx_num)
FROM transaction)
EDIT: The following INDEXES exist:
UNIQUE INDEX ON customer(pk_id) {SERIAL}
UNIQUE CLUSTER INDEX ON customer(last_name,sur_name,first_name,middle_name) {CHAR(30's)}
UNIQUE INDEX ON customer(ident_type,ident_num,ident_state,ident_country) {CHARS(n)}
UNIQUE CLUSTER INDEX ON transaction(fk_id) {INT}
UNIQUE INDEX ON transaction(trx_num) {SERIAL}
EDIT: EXPLAIN results for the above query:
QUERY:
------
SELECT *
FROM customer c,
transaction t,
WHERE c.pk_id = t.fk_id
AND t.trx_num = (SELECT MAX(trx_num)
FROM transaction)
Estimated Cost: 14
Estimated # of Rows Returned: 2
1) f.transaction: INDEX PATH
(1) Index Keys: trx_num
Lower Index Filter: f.transaction.trx_num = <subquery>
2) f.customer: INDEX PATH
(1) Index Keys: pk_id
Lower Index Filter: f.customer.pk_id = f.transaction.fk_id
Subquery:
---------
Estimated Cost: 3
Estimated # of Rows Returned: 1
1) f.trx_num: INDEX PATH
(1) Index Keys: trx_num
It seems like the query optimizer is doing a full table scan on transaction in order to locate the highest transaction number, whose new row is always placed at EOF , but since transactions are grouped by fk_id, the rest of the transaction numbers are scattered throughout the table.
Is there a better way for obtaining a quicker response time on the query?
Would CREATE UNIQUE INDEX trxnumidx ON transaction(transaction_number) DESCENDING help in quickly locating the MAX transaction_number?
EDIT: I do have a similar query which quickly retrieves the desired transaction and customer information so that the user can print the last entered transaction, or any previously entered transaction, but it requires the user to input the transaction number:
SELECT *
FROM customer c,
transaction t
WHERE c.pk_id = t.fk_id
AND t.trx_num = $trxnum {ace input variable, type INT}
What baffles me about this query, where the user manually inputs the transaction number, is that retrieval is instantaneous (total cost = 2), where as when automatic with MAX (total cost = 14).. The reason why I chose to make this query automatic is that in the past, when users manually input the transaction number, sometimes they inadvertently entered the wrong transaction number which happened to be a valid number and without realizing it, they signed and gave the customer a receipt with the wrong information on it!
EDIT: Would DBINFO('sqlca.sqlerrd1') be a more efficient way of locating the most recently inserted row?
Since you are wanting the maximum transaction number for a
fk_id, I think I would expect to create an (optionally UNIQUE) index:The index can be unique since transaction number alone is unique, but the point of this index is to speed up lookups for a given
fk_idvalue.I would then expect to use a query like the one that oers suggested:
This is still a correlated sub-query, but given that the customer number is used, there will only be one row of data to return and it is blatant to the optimizer that this is the case. Depending on the programming language I was using, I might well arrange for the sub-query to use:
Or even:
You could use placeholders or create the SQL from a string as suits.
Even better, I’d avoid the problem by capturing the inserted serial value immediately after the insert so that (a) if the program ever went multi-user, it would still work and (b) the whole MAX issue would be avoided, and the query could become:
Note that with the
c.pk_id = 1234567for the customer specified as a parameter to the query, you might not even need the index I suggested. And wherever I used a number, it would be fine with me to use a placeholder for the value to be supplied when the query is executed.Remember, you are using software that is old enough to drink, let alone vote; the optimizer is not as good as the one you’d find in a more modern DBMS.