I usually work with big dataframes that are pretty well sorted (or can be

Question

0

Editorial Team

Asked: May 26, 20262026-05-26T12:33:07+00:00 2026-05-26T12:33:07+00:00

I usually work with big dataframes that are pretty well sorted (or can be

0

I usually work with big dataframes that are pretty well sorted (or can be easily sorted).

Given two dataframes, both sorted by ‘user’

some.data <user> <data_1> <data_2> 
user <user> <user_attr_1> <user_attr_2>

And I run m = merge(some.data,user), I receive the result as:

m = <user> <data_1> <data_2> <user_attr_1> <user_attr_2>

And this is fine so.

But merge doesn’t take advantage of these dataframes being sorted on the common column making the merge pretty CPU/memory heavy. However, this merge could be done in O(n)

I am wondering if there is a way in R to conduct an efficient merge on sorted datasets?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T12:33:07+00:00

I don’t have any experience with it, but as far as I know, this is one of the issues that package data.tablewas designed to improve.

For most practical purposes, data.table=data.frame + index. As a consequence, when used right, this improves performance of quite a few large operations.

There is a danger that turning your data.frame into a data.table (i.e. adding the index) could take some time (although I expect this to be well optimized), but once you’ve got it up, functions like merge can easily use the index for better performance.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I usually work with big dataframes that are pretty well sorted (or can be

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply