How would one implement shuffle for the “Celestial Jukebox”?
More precisely, at each time t, return an uniform random number between 0..n(t), such that there are no repeats in the entire sequence, with n() increasing over time.
For the concrete example, assume a flat-rate music service which allows playing any song in the catalog by a 0 based index number. Every so often, new songs are added which increase range of index numbers. The goal is to play a new song each time (assuming no duplicates in the catalog).
an ideal solution would be feasible on existing hardware – how would I shoehorn a list of six million songs in 8MB of DRAM? Similarly, the high song count exacerbates O(n) selection timings.
— For an LCG generator, given a partially exhausted LCG on 0..N0, can that be translated to a different LCG on 0..N1 (where N1 > N0), that doen’t repeat the exhausted sequence.
— Checking if a particular song has already been played seems to rapidly grow out of hand, although this might be the only way ? Is there an efficient data structure for this?
The way that I like to do that kind of non-repeating random selection is to have a list, and each time I select an item at random between
[0-N), I remove it from that list. In your case, as new items get added to the catalog, it would also be added to the not-yet-selected list. Once you get to the end, simply reload all the songs back to the list.EDIT:
If you take v3’s suggestion into account, this can be done in basically
O(1)time after theO(N)initialization step. It guarantees non-repeating random selection.Here is the recap:
iat random (from set of[0,N))iiwith theNthitem (or null ifi == Nth) and decrementNNas necessarySince you are trying to deal with rather large sets, I would recommend the use of a DB. A simple table with basically two fields:
idand “pointer” (where “pointer” is what tells you the song to play which could be a GUID, FileName, etc, depending on how you want to do it). Have an index onidand you should get very decent performance with persistence between application runs.EDIT for 8MB limit:
Umm, this does make it a bit harder… In 8 MB, you can store a maximum of ~2M entries using 32-bit keys.
So what I would recommend is to pre-select the next 2M entries. If the user plays through 2M songs in a lifetime, damn! To pre-select them, do a pre-init step using the above algorithm. The one change I would make is that as you add new songs, roll the dice and see if you want to randomly add that song to the mix. If yes, then pick a random index and replace it with the new song’s index.