Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6364501
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T00:10:17+00:00 2026-05-25T00:10:17+00:00

I have large data in two files each with about two million (different) entries.

  • 0

I have large data in two files each with about two million (different) entries. The structure of the file is such that there is an event number, and for each event, there are some subevents. Each of these subevents have some characteristics. For example, the general structure of the files is such:

Index  Event     SubEvent      Characteristic1          Characteristic2 .... 
  1      1            1                 322                      234
  2      1            2                 453                      324
  3      1            3                 ...                      ...
  .      .            .                 ...                      ...
  .      .            .                 ...                      ...
 100     1           100                ...                      ...
 101     2            1                 ...                      ...
 102     2            2                 ...                      ...
  .      .            .                 ...                      ...
  .      .            .                 ...                      ...   
  .      .            .                 ...                      ... 
 207     2           107                ...                      ...
 208     3            1                 ...                      ...
 209     3            2                 ...                      ...

and so on, the index runs till about two million.

I have two Files, lets call them file1 and file2, with the above structure. I have to make some computations using their characteristics for each subevent of an event. Here’s the outline of what I have thought up.

LOOP over each INDEX in file1
LOOP over each INDEX in file2
if (Event value of file1 is same as event value of file2)
/* do some computations with characteristics and store them somewhere*/

The current implementation I have written

for (int i=0;i<nEntries_1;i++)  {
        file1->GetEntry(i);
         for (int_t j=0; j < nEntries_2 ; j++)    {
                file2->GetEntry(j);
                if (event1 != event2) break;
               else {
               /* Doing the computation with characteristics*/
               }      
               }
               }

However I think that this is wrong. Suppose we are at index 209 in the top file1 loop. Which means it needs to compute some characteristic for subevent 2 in event 3 of file1 with all the subevents of event3 in file2. However, the above code would break out of the loop as the event numbers of the first entry would not match.

What could be a possible solution. If I just do a brute force with no if-break command it takes way too long.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T00:10:17+00:00Added an answer on May 25, 2026 at 12:10 am

    In your loop you have to say continue to skip a round, rather than your break which aborts the entire loop.

    Design-wise, your algorithm is extremely inefficient, as you can convince yourself by doing a basic complexity analysis. Indexing your data suitably would almost certainly be necessary.

    This is exactly what databases are for. I recommend you rig up a small database (e.g. MySQL), make two tables and run a JOIN query on the data, which should be a lot more efficient than your manual loop.

    Alternatively, if you like to give it a try yourself, you could build your own micro-database in C++ with a structure like std::multimap and then use euqal_range() to do targetted matching.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a large collection of data in an excel file (and csv files).
I have a table that has about 1/2 million records in it. Each month
I need to read in two large files (over 125 MB). Each file contains
I have a large number of data points which are two dimensional coordinates with
I have a large data loaded from a pickled file. The data is a
Interpolating Large Datasets I have a large data set of about 0.5million records representing
We have some very large data files (5 gig to 1TB) where we need
I have this problem: I have a collection of small files that are about
I have a large amount of data to move using two PHP scripts: one
I have a script that reads a large file line by line. The record

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.