Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6068935
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T09:46:06+00:00 2026-05-23T09:46:06+00:00

Let’s say I have a MySQL table with four columns: ID DRIVER_ID CAR_ID NOTES

  • 0

Let’s say I have a MySQL table with four columns:

ID DRIVER_ID CAR_ID NOTES (NULL for most rows)

I have a bunch of duplicate rows where DRIVER_ID and CAR_ID are the same. For each pair of DRIVER_ID and CAR_ID, I want one row. If one of the rows in the set has non-NULL NOTES, I want that one, but otherwise it doesn’t matter.

So, If I have:

ID DRIVER_ID CAR_ID NOTES
1 1 1 NULL
2 1 1 NULL
3 1 2 NULL
4 1 2 NULL
5 2 3 NULL
6 2 3 NULL
7 2 3 NULL
8 2 3 hi
9 3 5 NULL

I want to keep the following IDs: 9, 8, and then one each of [3,4] and [1,2].

It’s a huge table, and the clunky methods I’ve tried are insanely slow, to the point where I’m sure I’m going about it all wrong. How can I efficiently:

  1. Select the list of IDs to delete?
  2. Delete them in the same query?

Some of the stuff I’ve tried so far:

select ID, COUNT(DRIVER_ID) rowcount from CARS_DRIVERS group by CAR_ID,DRIVER_ID HAVING rowcount > 1;

This will get me one ID per group, but it doesn’t necessarily leave the row with NOTES if there is one, though. It will also only get me one ID per duplicate group. There are some cases where there are 20+ duplicate combinations so I would need to iterate that over and over to whittle each group down to a single row.

select distinct t1.ID from CARS_DRIVERS t1 where exists (select * from CARS_DRIVERS t2 where t2.CAR_ID = t1.CAR_ID and t2.DRIVER_ID = t1.DRIVER_ID and t2.id > t1.id);

This is much slower, and still doesn’t really address the NOTES issue. It does have the advantage of getting the oldest row for each group, which, if I can’t isolate on the NOTES field easily, could be a proxy for that. If a row in a set has NOTES, I believe it’s always the oldest one (one with the lowest ID), but I’m not certain.

DRIVER_ID and CAR_ID are not the real column names, and there are other columns in the table. I was trying to distill down the info to get at the root of the problem, but I see from W4M’s comment that this makes it look like a homework assignment. The real deal is that I’m looking at a very unoptimized database (not my purview normally) and when trying to get rid of these dupes before adding a key, the operation is taking forever – as in, hours.

The table is big but certainly doesn’t justify that. I’m trying to pitch in with my limited SQL expertise and figure out a way to get this done. It doesn’t matter if it’s pretty, I can sit at the command line and brute-force a bunch of queries if necessary but I noticed that selecting IDs that are candidates for deletion only takes a few seconds, and although the table is huge, the total number of rows to delete is less than 10,000 so there must be a way to make this happen without some script that takes a whole weekend to finish.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T09:46:07+00:00Added an answer on May 23, 2026 at 9:46 am

    Here’s one solution. I tested this on MySQL 5.5.8.

    SELECT MAX(COALESCE(c2.id, c1.id)) AS id,
     c1.driver_id, c1.car_id,
     c2.notes AS notes
    FROM cars_drivers AS c1
    LEFT OUTER JOIN cars_drivers AS c2
     ON (c1.driver_id,c1.car_id) = (c2.driver_id,c2.car_id) AND c2.notes IS NOT NULL
    GROUP BY c1.driver_id, c1.car_id, c2.notes;
    

    I include c2.notes as a GROUP BY key because you might have more than one row with non-null notes per values of driver_id,car_id.

    Result using your example data:

    +------+-----------+--------+-------+
    | id   | driver_id | car_id | notes |
    +------+-----------+--------+-------+
    |    2 |         1 |      1 | NULL  |
    |    4 |         2 |      1 | NULL  |
    |    8 |         3 |      2 | hi    |
    |    9 |         5 |      3 | NULL  |
    +------+-----------+--------+-------+
    

    Regarding deleting. In your example data, it’s always the highest id value per driver_id & car_id that you want to keep. If you can depend on that, you can do a multi-table delete that deletes all rows for which a row with a higher id value and the same driver_id & car_id exists:

    DELETE c1 FROM cars_drivers AS c1 INNER JOIN cars_drivers AS c2
     ON (c1.driver_id,c1.car_id) = (c2.driver_id,c2.car_id) AND c1.id < c2.id;
    

    This naturally skips any cases where only one row exists with a given pair of driver_id & car_id values, because the conditions of the inner join require two rows with different id values.

    But if you can’t depend on the latest id per group being the one you want to keep, the solution is more complex. It’s probably more complex than it’s worth to solve in one statement, so do it in two statements.

    I tested this too, after adding a couple more rows for testing:

    INSERT INTO cars_drivers VALUES (10,2,3,NULL), (11,2,3,'bye');
    
    +----+--------+-----------+-------+
    | id | car_id | driver_id | notes |
    +----+--------+-----------+-------+
    |  1 |      1 |         1 | NULL  |
    |  2 |      1 |         1 | NULL  |
    |  3 |      1 |         2 | NULL  |
    |  4 |      1 |         2 | NULL  |
    |  5 |      2 |         3 | NULL  |
    |  6 |      2 |         3 | NULL  |
    |  7 |      2 |         3 | NULL  |
    |  8 |      2 |         3 | hi    |
    |  9 |      3 |         5 | NULL  |
    | 10 |      2 |         3 | NULL  |
    | 11 |      2 |         3 | bye   |
    +----+--------+-----------+-------+
    

    First delete rows with null notes, where a row with non-null notes exists.

    DELETE c1 FROM cars_drivers AS c1 INNER JOIN cars_drivers AS c2
     ON (c1.driver_id,c1.car_id) = (c2.driver_id,c2.car_id)
    WHERE c1.notes IS NULL AND c2.notes IS NOT NULL;
    
    +----+--------+-----------+-------+
    | id | car_id | driver_id | notes |
    +----+--------+-----------+-------+
    |  1 |      1 |         1 | NULL  |
    |  2 |      1 |         1 | NULL  |
    |  3 |      1 |         2 | NULL  |
    |  4 |      1 |         2 | NULL  |
    |  8 |      2 |         3 | hi    |
    |  9 |      3 |         5 | NULL  |
    | 11 |      2 |         3 | bye   |
    +----+--------+-----------+-------+
    

    Second, delete all but the highest-id row from each group of duplicates.

    DELETE c1 FROM cars_drivers AS c1 INNER JOIN cars_drivers AS c2
     ON (c1.driver_id,c1.car_id) = (c2.driver_id,c2.car_id) AND c1.id < c2.id;
    
    +----+--------+-----------+-------+
    | id | car_id | driver_id | notes |
    +----+--------+-----------+-------+
    |  2 |      1 |         1 | NULL  |
    |  4 |      1 |         2 | NULL  |
    |  9 |      3 |         5 | NULL  |
    | 11 |      2 |         3 | bye   |
    +----+--------+-----------+-------+
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Let's say I have this MySQL table: OK.. see the type field? Type 0
Let's say I have the following text: (example) <table> <tr> <td> <span>col1</span> </td> <td>col2</td>
Let's say I have table with column 'URL' whrere I store urls like this
Let's say I have a javascript array with a bunch of elements (anywhere from
Let's say I'm building a data access layer for an application. Typically I have
Let's say you have a class called Customer, which contains the following fields: UserName
Let's say we have a simple function defined in a pseudo language. List<Numbers> SortNumbers(List<Numbers>
Let's say I have a drive such as C:\ , and I want to
Let's say that we have an ARGB color: Color argb = Color.FromARGB(127, 69, 12,
Let's say I have two tables orgs and states orgs is (o_ID, state_abbr) and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.