I have a very big sequence of strings. Length of each string is 50. Each string includes only chars from english ABC. What is the best(the fastest) way to sort this sequence?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
If I had to code that, I’d probably make one pass that split the input into many output files depending on the first couple of characters or so; the goal being to make each output file small enough to fit in main memory. Then I would open each file in order, sort it in memory, and append it to the output. First pass is O(n), second is more or less O(n log n), and you have to do disk I/O four times per record. It might be possible to do better with some arcane algorithm, but probably not by much, and this is easy to understand and code.
If the system limits how many files you can have open at once, you might have to split up the first pass. If the strings aren’t well-distributed, some intermediate files might be too large.
In pseudocode:
EDIT: Wait, do you mean the records only contain the characters A, B, and C? No other letters? In that case you would probably have to split on an initial substring longer than 2. Splitting on the first 3 characters would divide it into 27 files, each of size 370 MB on average.