Which way is the best for removing duplicates from a dataTable for multiple columns?I mean below code is only for a single column.
public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}
I tried using string[] colName. It throws error at dTable.Rows.Remove(dRow);
Please suggest.
The easiest and most readable is using
Linq-to-DataTable:Notes:
Enumerable.GroupBygroups theDataRowsby an anonymous type with two properties(Col1andCol2) which are initialized from aDataRowfieldsColumn1andColumn2.So you get groups of
IEnumerable<DataRow>.Enumerable.First()returns the firstDataRowof each group (you could also use different methods to select the row you want to keep, for example by ordering by a date field).Then
CopyToDataTablecreates a new DataTable from the (now) distinct DataRows.Here’s a possible implementation if you’re using .NET 2:
implementation of a custom
IEqualityComparer<Object[]>for the dictionary:your
RemoveDuplicateRowsmethod:testing: