Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7850599
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T18:54:47+00:00 2026-06-02T18:54:47+00:00

CREATE TABLE hostname_table ( id INT NOT NULL AUTO_INCREMENT, hostname CHAR(65) NOT NULL, interval_avg

  • 0
CREATE TABLE hostname_table
(
id INT NOT NULL AUTO_INCREMENT,
hostname CHAR(65) NOT NULL,
interval_avg INT,
last_update DATETIME NOT NULL,
numb_updates INT,
PRIMARY KEY (id)
)

I have this table, and I import 500-600k rows of data into it. I do not check for duplicates when writing to the database, because I want to know how many duplicates of each host there is, and I also want to know the intervals between each update of said hostname.

Example values in hostname_table:

id  hostname          interval_avg  last_update          numb_updates
1   www.host.com      60            2012-04-25 20:22:21  1
2   www.hostname.com  10            2012-04-25 20:22:21  5
3   www.name.com      NULL          2012-04-25 20:22:21  NULL
4   www.host.com      NULL          2012-04-25 20:22:26  NULL
5   www.host.com      NULL          2012-04-25 20:22:36  NULL

Example of what I want it to look like when I have cleaned it up:

id  hostname          interval_avg  last_update          numb_updates
1   www.host.com      25            2012-04-25 20:22:36  3
2   www.hostname.com  10            2012-04-25 20:22:21  5
3   www.name.com      NULL          2012-04-25 20:22:21  NULL

With a huge database like this, I dont want to send too many queries to obtain this goal, but I believe 3 queries are the minimum for an operation like this(if I am wrong, please correct me). Each hour there will be ~500k new rows where ~50% or more will be duplicates, therefore its vital to get rid of those duplicates as efficiently as possible while still keeping a record of how many and how often the duplicates occured(hense the interval_avg and numb_update update).

This is a three step problem, and I was hoping the community here would give a helping hand.

So to summarize in pseudocode, I need help optimizing these queries;

  1. select all last_update and interval_avg values, get sum(numb_update), get count(duplicates) foreach hostname,
  2. update interval_avg in min(id), update numb_updates in min(id), update last_update in min(id) with the value from max(id),
  3. delete all duplicates except min(id)

SOLVED.
I have optimized one part by 94%, and another part by ~97% over the course of a couple of days researching. I truely hope this will help other searching for the same solutions. mySQL and large databases can be a big problem if you choose the wrong solution.
(I changed the last_update column from DATETIME to INT(10), and I changed from a formated time to a timestamp as value in my final solution to be able to get the max(last_update) and min(last_update) values)

(Thanks to GolezTrol for helping with parts of the problem)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T18:54:49+00:00Added an answer on June 2, 2026 at 6:54 pm

    You cannot get each different value of interval_avg and numb_updates for a hostname if you want to aggregate by that hostname. Did you mean to SUM or maybe AVG them? Or do you just want to keep the value of the lowest id?

    In the query below I sum them.

    SELECT 
      MIN(id) as id, 
      hostname, 
      SUM(interval_avg) as total_interval_avg,
      SUM(numb_updates) as total_numb_updates,
      COUNT(*) as hostname_count
    FROM
      hostname_table
    GROUP BY 
      hostname
    

    After this, you will need to update each found id with the right values for interval_avg and numb_updates.

    After that, you will need to delete each id that is not found by this query.

    DELETE FROM hostname_table
    WHERE
      id NOT IN
        (SELECT 
          MIN(id)
        FROM
          hostname_table
        GROUP BY 
          hostname)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

create table snippet( id int not null auto_increment, primary key(id), idlanguage int not null,
CREATE TABLE `sequence` (`id` int(11) NOT NULL auto_increment, `load_status` varchar(100) default NULL, PRIMARY KEY
CREATE TABLE `banned_ip` ( `id` INT( 25 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
create table ImagenesUsuario { idImagen int primary key not null IDENTITY } This doesn't
CREATE TABLE `db`.`Complete` ( `CompleteId` MEDIUMINT( 8 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
create table [User] ( UserId int primary key identity(1,1), FirstName nvarchar(256) not null, LastName
Create table FavoriteDish ( FavID int identity (1,1) primary key not null, DishID int
CREATE TABLE parent (id INT NOT NULL, PRIMARY KEY (id) ) ENGINE=INNODB; CREATE TABLE
CREATE TABLE Posts { id INT PRIMARY KEY AUTO_INCREMENT, title VARCHAR(200), url VARCHAR(200) }
CREATE TABLE `social_activity_stream` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `social_actor_id` int(11) NOT NULL, `social_activity_id`

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.