Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 109379
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T02:03:36+00:00 2026-05-11T02:03:36+00:00

This is a long text. Please bear with me. Boiled down, the question is:

  • 0

This is a long text. Please bear with me. Boiled down, the question is: Is there a workable in-place radix sort algorithm?


Preliminary

I’ve got a huge number of small fixed-length strings that only use the letters “A”, “C”, “G” and “T” (yes, you’ve guessed it: DNA) that I want to sort.

At the moment, I use std::sort which uses introsort in all common implementations of the STL. This works quite well. However, I’m convinced that radix sort fits my problem set perfectly and should work much better in practice.

Details

I’ve tested this assumption with a very naive implementation and for relatively small inputs (on the order of 10,000) this was true (well, at least more than twice as fast). However, runtime degrades abysmally when the problem size becomes larger (N > 5,000,000).

The reason is obvious: radix sort requires copying the whole data (more than once in my naive implementation, actually). This means that I’ve put ~ 4 GiB into my main memory which obviously kills performance. Even if it didn’t, I can’t afford to use this much memory since the problem sizes actually become even larger.

Use Cases

Ideally, this algorithm should work with any string length between 2 and 100, for DNA as well as DNA5 (which allows an additional wildcard character “N”), or even DNA with IUPAC ambiguity codes (resulting in 16 distinct values). However, I realize that all these cases cannot be covered, so I’m happy with any speed improvement I get. The code can decide dynamically which algorithm to dispatch to.

Research

Unfortunately, the Wikipedia article on radix sort is useless. The section about an in-place variant is complete rubbish. The NIST-DADS section on radix sort is next to nonexistent. There’s a promising-sounding paper called Efficient Adaptive In-Place Radix Sorting which describes the algorithm “MSL”. Unfortunately, this paper, too, is disappointing.

In particular, there are the following things.

First, the algorithm contains several mistakes and leaves a lot unexplained. In particular, it doesn’t detail the recursion call (I simply assume that it increments or reduces some pointer to calculate the current shift and mask values). Also, it uses the functions dest_group and dest_address without giving definitions. I fail to see how to implement these efficiently (that is, in O(1); at least dest_address isn’t trivial).

Last but not least, the algorithm achieves in-place-ness by swapping array indices with elements inside the input array. This obviously only works on numerical arrays. I need to use it on strings. Of course, I could just screw strong typing and go ahead assuming that the memory will tolerate my storing an index where it doesn’t belong. But this only works as long as I can squeeze my strings into 32 bits of memory (assuming 32 bit integers). That’s only 16 characters (let’s ignore for the moment that 16 > log(5,000,000)).

Another paper by one of the authors gives no accurate description at all, but it gives MSL’s runtime as sub-linear which is flat out wrong.

To recap: Is there any hope of finding a working reference implementation or at least a good pseudocode/description of a working in-place radix sort that works on DNA strings?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T02:03:37+00:00Added an answer on May 11, 2026 at 2:03 am

    Well, here’s a simple implementation of an MSD radix sort for DNA. It’s written in D because that’s the language that I use most and therefore am least likely to make silly mistakes in, but it could easily be translated to some other language. It’s in-place but requires 2 * seq.length passes through the array.

    void radixSort(string[] seqs, size_t base = 0) {     if(seqs.length == 0)         return;      size_t TPos = seqs.length, APos = 0;     size_t i = 0;     while(i < TPos) {         if(seqs[i][base] == 'A') {              swap(seqs[i], seqs[APos++]);              i++;         }         else if(seqs[i][base] == 'T') {             swap(seqs[i], seqs[--TPos]);         } else i++;     }      i = APos;     size_t CPos = APos;     while(i < TPos) {         if(seqs[i][base] == 'C') {             swap(seqs[i], seqs[CPos++]);         }         i++;     }     if(base < seqs[0].length - 1) {         radixSort(seqs[0..APos], base + 1);         radixSort(seqs[APos..CPos], base + 1);         radixSort(seqs[CPos..TPos], base + 1);         radixSort(seqs[TPos..seqs.length], base + 1);    } } 

    Obviously, this is kind of specific to DNA, as opposed to being general, but it should be fast.

    Edit:

    I got curious whether this code actually works, so I tested/debugged it while waiting for my own bioinformatics code to run. The version above now is actually tested and works. For 10 million sequences of 5 bases each, it’s about 3x faster than an optimized introsort.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 200k
  • Answers 200k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer Unfortunately, your only option is a filesystem "undelete" program. For… May 12, 2026 at 7:59 pm
  • Editorial Team
    Editorial Team added an answer Here's a direct quote from Essential Linux Device Drivers which… May 12, 2026 at 7:59 pm
  • Editorial Team
    Editorial Team added an answer I don't think you can advance the end iterator to… May 12, 2026 at 7:59 pm

Related Questions

This is a bit difficult to explain, so please bear with me. I am
Little silly question, but got stuck for a long time. I have written two
I need to add line breaks in the positions that the browser naturally adds
When I put a long continuous string in my fixed width td, it is

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.