I’m reading Cracking the Coding Interview, Fourth Edition: 150 Programming Interview Questions and Solutions and I’m trying to solve the following question:
2.1 Write code to remove duplicates from an unsorted linked list. FOLLOW
UP: How would you solve this problem if
a temporary buffer is not allowed?
I’m solving it in C#, so I made my own Node class:
public class Node<T> where T : class
{
public Node<T> Next { get; set; }
public T Value { get; set; }
public Node(T value)
{
Next = null;
Value = value;
}
}
My solution is to iterate through the list, then for each node to iterated through the remainder of the list and remove any duplicates (note that I haven’t actually compiled or tested this, as instructed by the book):
public void RemoveDuplicates(Node<T> head)
{
// Iterate through the list
Node<T> iter = head;
while(iter != null)
{
// Iterate to the remaining nodes in the list
Node<T> current = iter;
while(current!= null && current.Next != null)
{
if(iter.Value == current.Next.Value)
{
current.Next = current.Next.Next;
}
current = current.Next;
}
iter = iter.Next;
}
}
Here is the solution from the book (the author wrote it in java):
Without a buffer, we can iterate with
two pointers: “current” does a normal
iteration, while “runner” iterates
through all prior nodes to check for
dups. Runner will only see one dup per
node, because if there were multiple
duplicates they would have been
removed already.
public static void deleteDups2(LinkedListNode head)
{
if (head == null) return;
LinkedListNode previous = head;
LinkedListNode current = previous.next;
while (current != null)
{
LinkedListNode runner = head;
while (runner != current) { // Check for earlier dups
if (runner.data == current.data)
{
LinkedListNode tmp = current.next; // remove current
previous.next = tmp;
current = tmp; // update current to next node
break; // all other dups have already been removed
}
runner = runner.next;
}
if (runner == current) { // current not updated - update now
previous = current;
current = current.next;
}
}
}
So my solution always looks for duplicates for the current node to the end, while their solution looks for duplicates from the head to the current node. I feel like both solutions would suffer performance issues depending on how many duplicates there are in the list and how they’re distributed (density and position). But in general: is my answer nearly as good as the one in the book or is it significantly worse?
If you give a person a fish, they eat for a day. If you teach a person to fish…
My measures for the quality of an implementation are:
As for your implementation: