I have many tasks in .txt files in multiple sub folders. I am trying to pick up a total 10 tasks randomly from these folders, their contained files and finally a text line within a file. The selected line should be deleted or marked so it will be not picked in the next execution. This may be too broad a question but I’d appreciate any input or direction.
Here’s the code I have so far:
#!/usr/bin/python
import random
with open('C:\\Tasks\\file.txt') as f:
lines = random.sample(f.readlines(),10)
print(lines)
To get a proper random distribution across all these files, you’d need to view them as one big set of lines and pick 10 at random. In other words, you’ll have to read all these files at least once to at least figure out how many lines you have.
You do not need to hold all the lines in memory however. You’d have to do this in two phases: index your files to count the number of lines in each, then pick 10 random lines to be read from these files.
First indexing:
Now we have a mapping of offsets, pointing to filenames, and a total line count. Now we pick ten random indices, and read these from your files:
Note that you only need the indexing once; you can store the result somewhere and only update it when your files update.
Also note that your tasks are now ‘stored’ in the
taskslist; these are indices to lines in your files, and I remove the index from that variable when printing the task selected. Next time you run therandom.sample()choices, the tasks previously picked will no longer be available for picking the next time. This structure will need updating if your files ever do change, as the indexes have to be re-calculated. Thefile_indiceswill help you with that task, but that is outside the scope of this answer. 🙂If you need only one 10-item sample, use Blckknght’s solution instead, as it only will go through the files once, while mine require 10 extra file openings. If you need multiple samples, this solution only requires 10 extra file openings every time you need your sample, it won’t scan through all the files again. If you have fewer than 10 files, still use Blckknght’s answer. 🙂