As others have indicated, both of your problems are best…

Question

0

Asked: May 11, 20262026-05-11T12:27:25+00:00 2026-05-11T12:27:25+00:00

I would like to normalize data in a DataTable insertRows without a key. To

0

I would like to normalize data in a DataTable insertRows without a key. To do that I need to identify and mark duplicate records by finding their ID (import_id). Afterwards I will select only the distinct ones. The approach I am thinking of is to compare each row against all rows in that DataTable insertRows

The columns in the DataTable are not known at design time, and there is no key. Performance-wise, the table would have as much as 10k to 20k records and about 40 columns

How do I accomplish this without sacrificing performance too much?

I attempted using linq but I did not know how to dynamically specify the where criteria Here I am comparing first and last names in a loop for each row


foreach (System.Data.DataRow lrows in importDataTable.Rows) {     IEnumerable<System.Data.DataRow> insertRows = importDataTable.Rows.Cast<System.Data.DataRow>();      var col_matches =     from irows in insertRows     where     String.Compare(irows['fname'].ToString(), lrows['fname'].ToString(), true).Equals(0)     &&     String.Compare(irows['last_name'].ToString(), lrows['last_name'].ToString(),true).Equals(0)      select new { import_id = irows['import_id'].ToString() }; }

Any ideas are welcome. How do I find similar column names using linq?>my similar question

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T12:27:26+00:00

The easiest way to get this done without O(n²) complexity is going to be using a data structure that efficiently implements Set operations, specifically a Contains operation. Fortunately .NET (as of 3.0) contains the HashSet object which does this for you. In order to make use of this you’re going to need a single object that encapsulates a row in your DataTable.

If DataRow won’t work, I recommend converting relevant records into strings, concatenating them then placing those in the HashSet. Before you insert a row check to see if the HashSet already contains it (using Contains). If it does, you’ve found a duplicate.

Edit:

This method is O(n).

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions