What I have here works fine for data which is 10 row bytes x 10 column bytes =100 elements for example.
But now I tried it on 256 row bytes x 256 column bytes = 65536 elements and it’s taking about 30 minutes to sort the rows in proper lexicographical order. Anyway to optimize this function so it could take maybe 5 seconds maximum to complete.
I know I have to use some other sorting algorithm but I cannot really figure out what to do.
Function SortArrayOfArraysLexicoGraphically(ByRef data() As Byte) As Byte()
Dim lexicoGraphicalIndexes() As Byte
Dim dataSize As Long
dataSize = UBound(data) + 1
Dim squareRootMinusOne As Integer
Dim squareRoot As Integer
squareRoot = Sqr(dataSize)
squareRootMinusOne = squareRoot - 1
ReDim lexicoGraphicalIndexes(squareRootMinusOne)
Dim columnStart As Long
Dim row As Long
Dim column As Long
Dim rowSwapped As Boolean
For columnStart = 0 To UBound(lexicoGraphicalIndexes)
lexicoGraphicalIndexes(columnStart) = columnStart
Next columnStart
'start column from the last element from the row and go backwards to first element in that row.
For columnStart = squareRootMinusOne To 0 Step -1
Do
rowSwapped = False
Do
If data((row * squareRoot) + columnStart) > data(((row + 1) * squareRoot) + columnStart) Then
'Swaps a full row byte by byte.
For column = 0 To squareRootMinusOne
Call SwapBytes(data, (row * squareRoot) + column, ((row + 1) * squareRoot) + column)
Next column
Call SwapBytes(lexicoGraphicalIndexes, row, row + 1)
rowSwapped = True
End If
row = row + 1
Loop Until row > squareRootMinusOne - 1
row = 0
Loop Until rowSwapped = False
Next columnStart
'returns a byte array of sorted indexes.
SortArrayOfArraysLexicoGraphically = lexicoGraphicalIndexes
End Function
Public Sub SwapBytes(data() As Byte, firstIndex As Long, secondIndex As Long)
Dim tmpFirstByte As Byte
tmpFirstByte = data(firstIndex)
data(firstIndex) = data(secondIndex)
data(secondIndex) = tmpFirstByte
End Sub
The slow step in this is the copying, byte by byte, in a loop. I would take advantage of the RtlMoveMemory API call (often called CopyMemory). This does a block memory copy which is a lot faster. I also declare a module level array to act as the temporary buffer in the row swap. You could probably just merge the two procedures below, to make it self-contained: