In my system Users can add/edit/view Customers. I would like to add a feature allowing the user to see “Recently Viewed Customers”. This would show them the last 20 customers which they have seen (which includes add/edit).
Users will view customers very often as they skip between different web pages and this needs to be very efficient. I would like to persist this across sessions so it needs to be saved to the database. There are about 16,000 users and 600,000 customers.
Here is what I’m thinking as the design.
Create a new table:
- Columns are
(UserId, CustomerId, DateViewed) - Primary key is
(UserId, CustomerId) - Index-organised
- Separate indxes on foreign keys
UserIdandCustomerId DateViewedcolumn only exists to allow ordering of the records
Create a PL/SQL procedure that with parameters of UserId and CustomerId responsible for storing that the user viewed the customer. In the PL/SQL procedure I would:
- Use
MERGEto insert or update a row with the givenUserIdandCustomerIdsettingDateViewedtoSYSDATE - If a row was inserted by the merge, then use an analytic query to delete any rows with a
row_number()> 20
The “Recently Viewed Customers” page then becomes a basic join between this new table and the customer table, ordered by DateViewed and limited to 20 records just in case. No need to include DateViewed in any index as it’s only a 20 row sort.
Say, once a month, delete any records with DateViewed is older than a year. This would be a full scan. Cascade deletes from Customer and User to the new table.
Does anyone have suggestions for improvement or other ideas that are worth profiling?
(The other idea I had was to denormalise into a table with 20 columns for the different CustomerIds and shuffle values down from CustomerId1 -> CustomerId2 -> CustomerId3. This would require different updates depending on where the CustomerId already appeared in the list.)
I believe you’ve thought through the problem pretty thoroughly.
One thing I would suggest you try is deferring pruning the 21st (and later) most recently viewed customer(s) for a user. If you did this you’d have to include a TOP 20 in your selection query.
There will be some time required to complete the pruning operation (whether it’s done with each new view or later). There will also be some incremental time involved in picking the top 20 from from a list of more than 20.
Depending on exactly how frequently a customer add/edit/view is done, it may be that pruning each time a record is inserted is more expensive than sorting and selecting the TOP 20. You could perform the pruning as a scheduled background task, say once per hour or even once per day.
It is also possible, depending on the actual usage, that performance is not and issue and you should instead be optimizing for maintainability, in which case you should do the simplest thing with the least code.
Regarding your other idea (20 denormalized columns): This is not recommended!