For a website implemented in Django/Python we have the following requirement:
On a view page there are 15 messages per web paging shown. When there are more two or more messages from the same source, that follow each other on the view, they should be grouped together.
Maybe not clear, but with the following exemple it might be:
An example is (with 5 messages on a page this time):
Message1 Source1 Message2 Source2 Message3 Source2 Message4 Source1 Message5 Source3 ...
This should be shown as:
Message1 Source1 Message2 Source2 (click here to 1 more message from Source2) Message4 Source1 Message5 Source3 Message6 Source2
So on each page a fixed number of items is shown on page, where some have been regrouped.
We are wondering how we can create a Django or MySQL query to query this data in a optimal and in an easy way. Note that paging is used and that the messages are sorted by time.
PS: I don’t think there is a simple solution for this due to the nature of SQL, but sometimes complex problems can be easily solved
I don’t see any great way to do what you’re trying to do directly. If you’re willing to accept a little de-normalization, I would recommend a pre-save signal to mark messages as being at the head.
Then your query becomes magically simple:
To be quite honest…the signal listener would have to be a bit more complicated than the one I wrote. There are a host of lost synchronization/lost update problems inherent in my approach, the solutions to which will vary depending on your server (if it is single-processed, multi-threaded, then a python
Lockobject should get you by, but if it is multi-processed, then you will really need to implement locking based on files or database objects). Also, you will certainly also have to write a corresponding delete signal listener.Obviously this solution involves adding some database hits, but they are on edit as opposed to on view, which might be worthwhile for you. Otherwise, perhaps consider a cruder approach: grab 30 stories, loop through the in the view, knock out the ones you won’t display, and if you have 15 left, display them, otherwise repeat. Definitely an awful worst-case scenario, but perhaps not terrible average case?
If you had a server configuration that used a single process that’s multi-threaded, a Lock or RLock should do the trick. Here’s a possible implementation with non-reentrant lock:
Again, a corresponding delete signal is critical as well.
EDIT: Many or most server configurations (such as Apache) will prefork, meaning there are several processes going on. The above code will be useless in that case. See this page for ideas on how to get started synchronizing with forked processes.