I’ve hit a wall trying to merge a large file and a smaller one.

Question

0

Asked: May 27, 20262026-05-27T15:34:38+00:00 2026-05-27T15:34:38+00:00

I’ve hit a wall trying to merge a large file and a smaller one.

0

I’ve hit a wall trying to merge a large file and a smaller one. I have read many other posts about memory management in R, and haven’t been able to find a non-extreme (go 64bit, upload to a cluster, etc) method of resolving it. I’ve tried a bit with the bigmemory package, but not been able to find a solution. I thought I’d try here before I throw my hands up in frustration.

The code I’m running is like the below:

#rm(list=ls())
localtempdir<- "F:/Temp/"
memory.limit(size=4095)
[1] 4095
    memory.size(max=TRUE)
[1] 487.56
gc()
         used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 170485  4.6     350000   9.4   350000   9.4
Vcells 102975  0.8   52633376 401.6 62529185 477.1

client_daily<-read.csv(paste(localtempdir,"client_daily.csv",sep=""),header=TRUE)
object.size(client_daily)
>130MB

sbp_demos<-read.csv(paste(localtempdir,"sbp_demos",sep=""))
object.size(demos)
>0.16MB
client_daily<-merge(client_daily,sbp_demos,by.x="OBID",by.y="OBID",all.x=TRUE)
Error: cannot allocate vector of size 5.0 MB

I guess I’m asking are there any clever ways around this which don’t involve buying new hardware?

I need to be able to merge to create a bigger object.
I’ll then need to be doing regressions etc with that bigger object.

Should I give up? Should bigmemory be able to help solve this?

Any guidance greatly appreciated.

Details: R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386
(32-bit) Intel 2 Duo Core @2.33GHz, 3.48GB RAM

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T15:34:39+00:00

As Chase already mentioned, you can try data.table or perhaps sqldf.

For either one, you will likely get more juice out of it if you set the indexes appropriately.

With data.table you would:

dt1 <- data.table(sbp_demos, key='OBID')
dt2 <- data.table(client_daily, key='OBID')

## Do an INNER JOIN-like operation, where non-matching rows are removed
mi <- dt1[dt2, nomatch=0]

## Do a RIGHT JOIN(?)-like operation ... all rows in dt2 will be returned.
## If there is no matching row in dt1, the values in the dt1 columns for
## the merged row will be NA
mr <- dt1[dt2]

If you go the sqldf route, look at example 4i on its website … again, make sure you use indexes correctly.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve hit a wall trying to merge a large file and a smaller one.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply