In my small problem, I have n users and m equipments (m and n

Question

0

Asked: May 22, 20262026-05-22T02:25:23+00:00 2026-05-22T02:25:23+00:00

In my small problem, I have n users and m equipments (m and n

0

In my small problem, I have n users and m equipments (m and n ~ 50000). One user can use one and only one equipment at a time.

I have a list of records in this format [u, e, t], with t (time) sorted ascending. Each record mean user u is using equipment e at time t. The number of records is around 500 million. Assume that two nearest records with the same u and e mean that u is using e continuously. For example:

1, 2, 1  
3, 4, 1    
1, 2, 3
1, 2, 4 
1, 2, 5 
2, 6, 6
3, 2, 6  
3, 2, 8

would mean user 1 uses equipment 2 from 1 to 5.

What i want to do is from this list, infer the shift time in this format: [u, e, st, et] which means user u uses equipment e from start time st to end time et.

Result for the sample data would be:

1, 2, 0, 5  
3, 4, 0, 6  
3, 2, 6, 8

(assuming time starts from 0 and end at max(t), and when a pair of (u, e) is first seen, u has already started using e since the beginning of time 0. Similar for the last records.)

Given the big list (500 million record) but small enough m and n, how could I do this most efficiently?

@Edit: Possible data inconsistencies:
1: If there’s only 1 record (so no end time) such as the case of [2, 6, 6] in the sample data:
— If that’s the only time user 2 and equipment 6 appear in the data set, then ignore the data point.
— If after that record, user 2 uses another equipment, let say 7 at 10, then 2 uses 6 from 6 to 10.
— If after that record, equipment 6 is used by another user, let say 10 at 11, then 2 uses 6 from 6 to 11.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T02:25:24+00:00

Define two structures (I know this is Java, but let’s assume a generic algorithm):

struct user_record {
    int machine_idx;
    int start_time;
}

struct machine_record {
    int user_idx;
    int start_time;
}

Given that a user cannot be using more than one piece of equipment at the same time, you could create an array/vector of user_records, one for each user (you said this is ~ 50k, so this should be tractable), and an array/vector of machine_records, one for each machine. Initialise all elements’ idx members to -1 (to indicate not currently active).

Then every time you encounter an input record, check the state of the corresponding idx fields in the user_record and machine_record arrays. There are three possibilities:

Both are -1. This is a start point, so set those elements to “point” at each other, and record start_time in each one.
Both are not -1, and consistent. This is an end-point, so simply create an output record, and reset those elements’ idx fields back to -1.
At least one is not -1, but they are inconsistent. You will need to create two output records, overwrite the elements with the new values, and also set the corresponding old machine/user indices to -1.

This is O(N) time (where N is the number of input records).

Note: The output will be sorted by end-times.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In my small problem, I have n users and m equipments (m and n

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply