Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8440481
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T08:18:21+00:00 2026-06-10T08:18:21+00:00

The problem : I have multiple parallel processes that handle flat file records .

  • 0

The problem:

I have multiple parallel processes that handle flat file records. Each file corresponds to a given interface in a telecommunications system (a message passing through the system is given a 32-digit globally unique identifier and there can be records for a given message on multiple interfaces). There is one process handling each file.

Let’s call the interfaces: A, B and C. The message string can differ according to the which interface it was written by. I am supposed to create a table that stores information about each message passing through the system. So, this table should contain (among other fields):
id, message_on_A, message_on_B, message_on_C. I’d like to avoid duplicate entries for the same id.

What i have tried is the following:

  1. setting id as PRIMARY KEY and using INSERT ON DUPLICATE KEY UPDATE commands to set the corresponding message field for each process
  2. breaking down id into multiple parts and using these parts as a compound primary key; the rest is the same as 1.
  3. storing all records, then using a second query to extract all the information for each id (using GROUP BY ID, and max(message_on_A), max(message_on_B), max(message_on_C)). There is no primary key defined for this approach.

None of these approaches have been fast enough. I’m looking for a solution that can achieve a run-time of about 30 seconds for 1 million ids (so 3 million records considering 3 interfaces).

The first and second approach did the job in about 400 seconds on MyISAM tables. I have also tried on InnoDB but it was much slower.

At the moment i’m considering giving approach 3 another shot, but i need to find a much faster query (the GROUP BY and max() query lasted over 20 minutes before i terminated it)

The question:
Can anybody suggest a better schema for this problem? And a better query?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T08:18:22+00:00Added an answer on June 10, 2026 at 8:18 am

    I am thinking of a modification of the third approach. Store the data in three separate tables, with the GUId as the primary key in each table. This should make insertions happen as fast as possible. Handle duplicates at this level.

    Instead of group by, try the following:

    select A.id,
           A.message as A_message,
           (select B.message from B where B.id = A.id limit 1) as B_message,
           (select C.message from C where C.id = A.id limit 1) as C_message
    from A
    

    If this works, then your only problem is when messages are missing the A component. I think there is a way to fix that as well. The question is whether this achieves your performance goals.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Hi The problem am having is that I have multiple TreeView control and each
I have a embarrassingly parallel problem that I want to execute on multiple processors.
The problem is this: I have multiple competing threads (100+) that need to access
i have a problem calling multiple instance of a class that i have coded
The problem. I have multiple tab, their content is loaded via ajax, so the
I am trying to solve the following problem with Puppet: I have multiple nodes.
I have a problem conserning uploading multiple files to my ftp server and I
i have a problem with triggering the multiple alarm at first time , here
I have a problem with my site. The site has multiple entities: Articles, Posts,
I am building multiple forms app in Builder XE2 and I have a problem

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.