I have the following program to remove even numbers from a string vector, when

Question

0

Asked: May 23, 20262026-05-23T23:04:21+00:00 2026-05-23T23:04:21+00:00

I have the following program to remove even numbers from a string vector, when

0

I have the following program to remove even numbers from a string vector, when the vector size grows larger, it might take a long time, so I thought of threads, but using 10 threads is not faster then one thread, my PC has 6 cores and 12 threads, why ?

import java.util.*;

public class Test_Threads
{
  static boolean Use_Threads_To_Remove_Duplicates(Vector<String> Good_Email_Address_Vector,Vector<String> To_Be_Removed_Email_Address_Vector)
  {
    boolean Removed_Duplicates=false;
    int Threads_Count=10,Delay=5,Average_Size_For_Each_Thread=Good_Email_Address_Vector.size()/Threads_Count;

    Remove_Duplicate_From_Vector_Thread RDFVT[]=new Remove_Duplicate_From_Vector_Thread[Threads_Count];
    Remove_Duplicate_From_Vector_Thread.To_Be_Removed_Email_Address_Vector=To_Be_Removed_Email_Address_Vector;
    for (int i=0;i<Threads_Count;i++)
    {
      Vector<String> Target_Vector=new Vector<String>();
      if (i<Threads_Count-1) for (int j=i*Average_Size_For_Each_Thread;j<(i+1)*Average_Size_For_Each_Thread;j++) Target_Vector.add(Good_Email_Address_Vector.elementAt(j));
      else for (int j=i*Average_Size_For_Each_Thread;j<Good_Email_Address_Vector.size();j++) Target_Vector.add(Good_Email_Address_Vector.elementAt(j));
      RDFVT[i]=new Remove_Duplicate_From_Vector_Thread(Target_Vector,Delay);
    }

    try { for (int i=0;i<Threads_Count;i++) RDFVT[i].Remover_Thread.join(); }
    catch (Exception e) { e.printStackTrace(); }                                                   // Wait for all threads to finish

    for (int i=0;i<Threads_Count;i++) if (RDFVT[i].Changed) Removed_Duplicates=true;

    if (Removed_Duplicates)                                                                        // Collect results
    {
      Good_Email_Address_Vector.clear();
      for (int i=0;i<Threads_Count;i++) Good_Email_Address_Vector.addAll(RDFVT[i].Target_Vector);
    }

    return Removed_Duplicates;
  }

  public static void out(String message) { System.out.print(message); }
  public static void Out(String message) { System.out.println(message); }

  public static void main(String[] args)
  {
    long start=System.currentTimeMillis();

    Vector<String> Good_Email_Address_Vector=new Vector<String>(),To_Be_Removed_Email_Address_Vector=new Vector<String>();
    for (int i=0;i<1000;i++) Good_Email_Address_Vector.add(i+"");
    Out(Good_Email_Address_Vector.toString());
    for (int i=0;i<1500000;i++) To_Be_Removed_Email_Address_Vector.add(i*2+"");
    Out("=============================");

    Use_Threads_To_Remove_Duplicates(Good_Email_Address_Vector,To_Be_Removed_Email_Address_Vector);  // [ Approach 1 : Use 10 threads ] 
//    Good_Email_Address_Vector.removeAll(To_Be_Removed_Email_Address_Vector);                       // [ Approach 2 : just one thread ]
    Out(Good_Email_Address_Vector.toString());

    long end=System.currentTimeMillis();
    Out("Time taken for execution is " + (end - start));
  }
}

class Remove_Duplicate_From_Vector_Thread
{
  static Vector<String> To_Be_Removed_Email_Address_Vector;
  Vector<String> Target_Vector;
  Thread Remover_Thread;
  boolean Changed=false;

  public Remove_Duplicate_From_Vector_Thread(final Vector<String> Target_Vector,final int Delay)
  {
    this.Target_Vector=Target_Vector;

    Remover_Thread=new Thread(new Runnable()
    {
      public void run()
      {
        try
        {
          Thread.sleep(Delay);
          Changed=Target_Vector.removeAll(To_Be_Removed_Email_Address_Vector);
        }
        catch (InterruptedException e) { e.printStackTrace(); }
        finally { }
      }
    });
    Remover_Thread.start();
  }
}

In my program you can try “[ Approach 1 : Use 10 threads ]” or “[ Approach 2 : just one thread ]” there isn’t much difference speed wise, I expext it to be several times faster, why ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T23:04:22+00:00

Vector Synchronization Creates Contention

You’ve split up the vector to be modified, which avoids a some contention. But multiple threads are accessing a the static Vector To_Be_Removed_Email_Address_Vector, so much contention still remains (all Vector methods are synchronized).

Use an unsynchronized data structure for the shared, read-only information so that there is no contention between threads. On my machine, running your test with ArrayList in place of Vector cut the execution time in half.

Even without contention, thread-safe structures are slower, so don’t use them when only a single thread has access to an object. Additionally, Vector is largely obsolete by Java 5. Avoid it unless you have to inter-operate with a legacy API you can’t alter.

Choose a Suitable Data Structure

A list data structure is going to provide poor performance for this task. Since email addresses are likely to be unique, a set should be a suitable replace, and will removeAll() much faster on large sets. Using HashSet in place of the original Vector cut execution time on my (8 core) machine from over 5 seconds to around 3 milliseconds. Roughly half of this improvement is due to using the right data structure for the job.

Concurrent Structures Are a Bad Fit

Using a concurrent concurrent data structure is relatively slow, and doesn’t simplify the code, so I don’t recommend it.

Using a more up-to-date concurrent data structure is much faster than contending for a Vector, but the concurrency overhead of these data structures is still much higher than single-threaded structures. For example, running the original code on my machine took more than five seconds, while a ConcurrentSkipListSet took half a second, and a ConcurrentHashMap took one eighth of a second. But remember, when each thread had its own HashSet to update, the total time was just 3 milliseconds.

Even when all threads are updating a single concurrent data structure, the code needed to partition the workload is very similar to that used to create a separate Vector for each thread in the original code. From a readability and maintenance standpoint, all of these solutions have equivalent complexity.

If you had a situation where “bad” email addresses were being added to the set asynchronously, and you wanted readers of the “good” list to see those updates auto-magically, a concurrent set would be a good choice. But, with the current design of the API, where consumers of the “good” list explicitly call a blocking filter method to update the list, a concurrent data structure may be the wrong choice.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have the following program to remove even numbers from a string vector, when

Leave an answerCancel reply

1 Answer

Vector Synchronization Creates Contention

Choose a Suitable Data Structure

Concurrent Structures Are a Bad Fit

Leave an answer
Cancel reply