Possible Duplicate: Some issues trying to read a file with cbc.read.table function in R

Question

0

Asked: June 4, 20262026-06-04T06:06:55+00:00 2026-06-04T06:06:55+00:00

Possible Duplicate: Some issues trying to read a file with cbc.read.table function in R

0

Possible Duplicate:
Some issues trying to read a file with cbc.read.table function in R + using filter while reading files

a)I’m trying to read a relatively big .txt file with the function cbc.read.table from the colbycol package in R. According to what I’ve been reading this package makes job easier when we have large files (more than a GB to be read in R) and we don’t need all of the columns/variables for our analysis. Also, I read that the function cbc.read.table could support the same read.table‘s parameters. However, if I pass the parameter nrows (in order to get a preview of my file in R) I get the following error:

#My line code. I'm just reading columns 5,6,7,8 out of 27
i.can <- cbc.read.table( "xxx.txt", header = T, sep = "\t",just.read=5:8, nrows=20)
#error message
Error in read.table(file, nrows = 50, sep = sep, header = header, ...) : 
formal argument "nrows" matched by multiple actual arguments

So, my question is: could you tell me how can I solve this problem?

b) After that, I tried to read all instances with the following code:

i.can.b <- cbc.read.table( "xxx.txt", header = T, sep = "\t",just.read=4:8) #done perfectly
my.df <- as.data.frame(i.can.b) #getting error in this line
Error in readSingleKey(con, map, key) : unable to obtain value for key 'Company' #Company is a string column in my data set

So, my question is again: How can I solve this?

c) Do you know a way in which I can filter (by conditions on instances) while reading files?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T06:06:57+00:00

In reply to a):

cbc.read.table() reads in the data in 50 row chunks:

tmp.data <- read.table(file, nrows = 50, sep = sep, header = header, 
        ...)

Since the function already assigns the nrows argument the value 50, when it passes the nrows argument that you specify, there are two nrows arguments passed to read.table(), resulting in the error. To me, this seems to be a bug. To get around this, you can either modify the cbc.read.table() function to handle the specified nrows argument or accept something like a max.rows argument (and perhaps pass it along to the maintainer as a potential patch). Alternatively, you can specify the sample.pct argument, which specifies the proportion of rows to read. So, if the file contains 100 rows, and you only want 50: sample.pct = 0.5.

In reply to b):

Not sure what that error means. It is hard to diagnose without a reproducible example. Do you get the same error if you read in a smaller file?

In reply to c):

I generally prefer storing very large character data in a relational database, such as MySQL. It might be easier in your case to use the RSQLite package, which embeds an SQLite engine within R. Then SQL SELECT queries can be used to retrieve conditional subsets of data. Other packages for larger-than-memory data can be found under Large memory and out-of-memory data here: http://cran.r-project.org/web/views/HighPerformanceComputing.html

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Possible Duplicate: Some issues trying to read a file with cbc.read.table function in R

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply