Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8670737
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T18:46:41+00:00 2026-06-12T18:46:41+00:00

I have a record type and a dynamic array made up of that record

  • 0

I have a record type and a dynamic array made up of that record type. I pass it to a mergesort routine and try to set one of it’s field properties which is boolean to true but seems does not take effect.

I looked into sorting array of record by other means(see this quicksort for customrecord array: http://en.wikibooks.org/wiki/Algorithm_Implementation/Sorting/Quicksort#Delphi) or here: Best way to sort an array (I could not get none of these suggestions to work from here mostly because of creating a comaring function).
This question: Sorting of Arrays Alphabetically? was helpful and works but this sorting is excruciatingly slow.

CODE:

type    
       TCustomRecord = Record
        fLine     : AnsiString; //full line
        fsubLine     : AnsiString; // part of full line
        isDuplicate : boolean;  //is that subline duplicate in another line
        isRefrence     : boolean; // is this line from a refrence file or the one being deduped
        fIndex    : Cardinal; // original order line was loaded
       end;
      TCustomRecordArray = array of TCustomRecord; 

function Merge2(var Vals: array of TCustomRecord ):Integer;
var
  AVals: array of TCustomRecord;

   //returns index of the last valid element
  function Merge(I0, I1, J0, J1: Integer):Integer;
  var
    i, j, k, LC:Integer;
  begin
    LC := I1 - I0;
    for i := 0 to LC do
      AVals[i]:=Vals[i + I0];
      //copy lower half or Vals into temporary array AVals

    k := I0;
    i := 0;
    j := J0;
    while ((i <= LC) and (j <= J1)) do
    if (AVals[i].fsubLine < Vals[j].fsubLine) then
    begin
      Vals[k] := AVals[i];
      if Vals[k].isRefrence = False then
        Vals[k].isDuplicate := False;
      inc(i);
      inc(k);
    end
    else if (AVals[i].fsubLine > Vals[j].fsubLine) then
    begin
      Vals[k]:=Vals[j];
      if Vals[k].isRefrence = False then
        Vals[k].isDuplicate := False;
      inc(k);
      inc(j);
    end else
    begin //duplicate
      Vals[k] := AVals[i];
      if Vals[k].isRefrence = False then
        Vals[k].isDuplicate := True;
      inc(i);
      inc(j);
      inc(k);
    end;

    //copy the rest
    while i <= LC do begin
      Vals[k] := AVals[i];
      inc(i);
      inc(k);
    end;

    if k <> j then
      while j <= J1 do begin
        Vals[k]:=Vals[j];
        inc(k);
        inc(j);
      end;

    Result := k - 1;
  end;

 //returns index of the last valid element

  function PerformMergeSort(ALo, AHi:Integer): Integer; //returns
  var
    AMid, I1, J1:Integer;
  begin

  //It would be wise to use Insertion Sort when (AHi - ALo) is small (about 32-100)
    if (ALo < AHi) then
    begin
      AMid:=(ALo + AHi) shr 1;
      I1 := PerformMergeSort(ALo, AMid);
      J1 := PerformMergeSort(AMid + 1, AHi);
      Result := Merge(ALo, I1, AMid + 1, J1);
    end else
      Result := ALo;
  end;

begin
  //SetLength(AVals, Length(Vals) + 1 div 2);
  SetLength(AVals, Length(Vals) div 2 + 1);
  Result := 1 + PerformMergeSort(0, High(Vals));
end;

QUESTION:
How can I sort efficiently, preferably using mergesort, this array of record and set some of it’s properties according to that sort? Thank you.

UPDATE:
I added a pointer type and did a modified mergesort on array of pointers. This turned out to be very fast way of sorting the array of record. I added also a compare routine which added the flags I needed. The only part I am not able to do is to add a flag for duplicates based on if they belonged to file A or Reference file.

CODE:

    type    
          PCustomRecord = ^TCustomRecord; 
          TCustomRecord = Record
            fLine     : AnsiString; //full line
            fsubLine  : AnsiString; // part of full line
            isDuplicate : boolean;  //is that subline duplicate in another line
            isRefrence     : boolean; // line from a refrence file or the one being deduped
            isUnique  : boolean; //flag to set if not refrence and not dupe
            fIndex    : Cardinal; // original order line was loaded
           end;
          TCustomRecordArray = array of TCustomRecord;
          PCustomRecordList = ^TCustomRecordArray;

//set up actual array
//set up pointer array to point at actual array
//sort by mergesort first
// then call compare function - this can be a procedure obviously

function Compare(var PRecords: array of PCustomRecord; iLength: int64): Integer;
var
  i : Integer;
begin
  for i := 0 to High(PRecords) do
  begin
    Result := AnsiCompareStr(PRecords[i]^.fsubline, PRecords[i+1]^.fsubline);
    if Result=0 then
    begin
      if (PRecords[i].isrefrence = False) then
        PRecords[i].isduplicate := True
      else if (PRecords[i+1].isrefrence = False) then
        PRecords[i+1].isduplicate := True;
    end;
  end;
end; 

procedure MergeSort(var Vals:array of PCustomRecord;ACount:Integer);
var AVals:array of PCustomRecord;

  procedure Merge(ALo,AMid,AHi:Integer);
  var i,j,k,m:Integer;
  begin
    i:=0;
    for j:=ALo to AMid do
    begin
      AVals[i]:=Vals[j];
      inc(i);
      //copy lower half or Vals into temporary array AVals
    end;

    i:=0;j:=AMid + 1;k:=ALo;//j could be undefined after the for loop!
    while ((k < j) and (j <= AHi)) do
    if (AVals[i].fsubline) <= (Vals[j].fsubline) then
    begin
      Vals[k]:=AVals[i];
      inc(i);inc(k);
    end
    else if (AVals[i].fsubline) > (Vals[j].fsubline) then
    begin
      Vals[k]:=Vals[j];
      inc(k);inc(j);
    end;

    {locate next greatest value in Vals or AVals and copy it to the
     right position.}

    for m:=k to j - 1 do
    begin
      Vals[m]:=AVals[i];
      inc(i);
    end;
    //copy back any remaining, unsorted, elements
  end;

  procedure PerformMergeSort(ALo,AHi:Integer);
  var AMid:Integer;
  begin
    if (ALo < AHi) then
    begin
      AMid:=(ALo + AHi) shr 1;
      PerformMergeSort(ALo,AMid);
      PerformMergeSort(AMid + 1,AHi);
      Merge(ALo,AMid,AHi);
    end;
  end;

begin
  SetLength(AVals, ACount div 2 + 1);
  PerformMergeSort(0,ACount - 1);
end;

This is all very fast on small files taking less than one second. Deduping the items in the array that carry a duplicate flag and NOT a reference flag is quite challenging though. As mergesort is a stable sort I tried resorting by boolean flag but did not get what I expected. I used a TStringlist to see if my previous flags are being set up correctly and it works perfectly. The time went up from 1 second to 6 seconds. I know there has to be an easy way to mark the isUnique flag without TStringlist.

Here is what I tried:

function DeDupe(var PRecords: array of PCustomRecord; iLength: int64): Integer;
var
  i : Integer;
begin
  for i := 0 to High(PRecords) do
  begin
    if (PRecords[i]^.isrefrence = False) and (PRecords[i+1]^.isrefrence = false)then
    begin
      Result := AnsiCompareStr(PRecords[i]^.isduplicate, PRecords[i+1]^.isduplicate);
      if Result = 0 then PRecords[i]^.isUnique := True;
    end
    else
    begin
      Continue;
    end;
  end;
end;

This doesn’t get all the values and I did not see a difference with it as I still see lots of duplicates. I think the logic is wrong.

Thanks to all the great souls helping out. To all please allow me the benefit that I may already know how to derive a TObject and how to use a TStringList so the focus is on arrays.

QUESTION:
Help me do a function or procedure as above to mark the repeated items with the:
isRefrence = false and isDuplicate = True and unique

EDIT 3:
I was able to achieve the elimination of duplicates through the use of boolean flags. this helped in keeping the array stable without changing the size of the array. I believe it is much much faster than using TList descendant or TStringList. The use of a basic container such as an array has limitations in ease of coding but is very efficient so I would not pass on it. The pointers made the sorting a breeze. I’m not sure how after I set the pointers to my array when i used the pointer array exactly like I’m using my regular array. And it made no difference whether I derefrenced it or not. I set up the pointer array as such:

  iLength := Length(Custom_array); //get length of actual array
  SetLength(pcustomRecords, iLength); // make pointer array equal + 1

  for M := Low(Custom_array) to High(Custom_array) do //set up pointers
  begin
    pcustomRecords[M] := @Custom_array[M]; 
  end;

I tried seperating the sorting from the actual data being sorted as much as I can, but I’m sure there can be improvement.

///////////////////////////////////////////////////////////////////
function Comparesubstring(Item1, Item2: PCustomRecord): Integer;
begin
  Result := AnsiCompareStr(item1^.fsubline, item2^.fsubline);
end;
///////////////////////////////////////////////////////////////////
function CompareLine(Item1, Item2: PCustomRecord): Integer;
begin
  Result := AnsiCompareStr(item1^.fLine, item2^.fLine);
end;
///////////////////////////////////////////////////////////////////
function Compare(var PRecords: array of PCustomRecord; iLength: int64): Integer;
var
  M, i : Integer;
begin
  M := Length(PRecords);
  for i := 1 to M-1 do
  begin
    Result := Comparesubstring(PRecords[i-1], PRecords[i]);
    if Result=0 then
    begin
      if (PRecords[i-1].isRefrence = False) then
        PRecords[i-1].isduplicate := True
      else if (PRecords[i].isRefrence = False) then
        PRecords[i].isduplicate := True;
    end;
  end;
end;
///////////////////////////////////////////////////////////////////
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T18:46:42+00:00Added an answer on June 12, 2026 at 6:46 pm

    1) do not copy data! work with pointers.
    You should make list/array of pointers to those data records and sort pointers instead. After sort would be complete – just create new arrays of data based on pointers array. Pointer move is single CPU command. SizeOf(your record) is >> SizeOf(pointer) and is MUCH slower when moving.

    2) Mergesort rocks on HUGE data amount, that does not fit into memory. If you have 10 gigabytes of data you can not sort them in 2GB memory allowed for Win32 programs. So you have to sort them while they are on-disk. That is the niche of Mergesort. Why not use ready QuickSort routines instead, if all your data is in-memory ?

    So make a TList, fill it with type PCustomRecord = ^TCustomRecord; pointers, implement proper comparison function and call checked quicksort by TList.Sort method.

    http://docwiki.embarcadero.com/CodeExamples/XE2/en/TListSort_(Delphi)

    After list is sorted – create and populate new array of data.
    After that new array is created – free the list and remove the older source array.


    If possible – check if data fits in memory. Only reside to on-disk search if memory is not enough. It wold be slower, much slower.


    I did it in school… Mergesort is not recursive. It is VERY basic loop. I implemented it due to itse simplicity. I still do not have gut fealings for QuickSort, to compare with.

    In pseudocode it looks like

    FrameSize := 1;
    Loop start:
      Phase 1: splitting
         Loop until not empty TempMergedDataFile:
            Read record by record from TempMergedDataFile 
                and write each of them into TempSplitDataFile-1
                up to FrameSize times
            Read record by record from TempMergedDataFile 
                and write each of them into TempSplitDataFile-2
                up to FrameSize times
         Loop end
         Delete TempMergedDataFile 
      Phase 2: sorting-merging
         Loop until not empty TempSplitDataFile-1 and TempSplitDataFile-2:
            Read record by record from both TempSplitDataFile-1 and TempSplitDataFile-2
              up to FrameSize each (2xFrameSize in total in each iteration)
              write them sorted into TempMergedDataFile
         end loop
         delete TempSplitDataFile-1 and TempSplitDataFile-2
      Phase 3: update expectations
         FrameSize := FrameSize * 2
         if FrameSize > actual number of records - then exit loop, sort complete
    End loop
    

    Be careful with Phase 2 implementation. comparison with either actual value or nil if frame is exceeded by one of files. Well, the idea is obvious and probably demoed somewhere. Just be pedantic in this part. Probably FSM implementation might be easy good.

    Obvious optimizations:

    1. place all files on different physical dedicated HDDs, so each HDD would be in linear reading/writing mode
    2. merge phase 1 and phase 2: make TempMergedDataFile virtual, actually consisting of TempSplitDataFile-3 and TempSplitDataFile-4. Split the data into next-size frames while you are writing into it.
    3. if SSDs or flash cards are used for storage, then the data copying would wore out hardware. Better to sort some kind of “pointers” or “indexes” for actual sorting. There is also small chance, that while full data frames exceed RAM, the mere “array of indexes” would fit in. However with actual HDD without testing i’d better stick with naive “copy and copy and copy once again” approach.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a record that contains a dynamic array. It is normal that when
I have a F# record type and want one of the fields to be
I have a next code: type THead = packed record znmpc: byte; znmpcch: array
Lets say have this immutable record type: public class Record { public Record(int x,
I have a delphi programm which send record: type TMyNetworkPckg = record za: byte;
Say I have a query that fetches [type][show_name]. For all [type]==5 records, I need
I have a record that came from the follow linq query: using (var context
Previously i have static array for the matrix dataset design TMatrix = record row,
I have the below function defined that passes a dynamic id to the function
I have a Delphi DLL that contains the following types: type TStepModeType = (smSingle,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.