I use R for most of my statistical analysis. However, cleaning/processing data, especially when

Question

0

Asked: May 26, 20262026-05-26T10:06:04+00:00 2026-05-26T10:06:04+00:00

I use R for most of my statistical analysis. However, cleaning/processing data, especially when

0

I use R for most of my statistical analysis. However, cleaning/processing data, especially when dealing with sizes of 1Gb+, is quite cumbersome. So I use common UNIX tools for that. But my question is, is it possible to, say, run them interactively in the middle of an R session? An example: Let’s say file1 is the output dataset from an R processes, with 100 rows. From this, for my next R process, I need a specific subset of columns 1 and 2, file2, which can be easily extracted through cut and awk. So the workflow is something like:

Some R process => file1
cut --fields=1,2 <file1 | awk something something >file2
Next R process using file2

Apologies in advance if this is a foolish question.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T10:06:05+00:00

Try this (adding other read.table arguments if needed):

# 1
DF <- read.table(pipe("cut -fields=1,2 < data.txt| awk something_else"))

or in pure R:

# 2
DF <- read.table("data.txt")[1:2]

or to not even read the unwanted fields assuming there are 4 fields:

# 3
DF <- read.table("data.txt", colClasses = c(NA, NA, "NULL", "NULL"))

The last line could be modified for the case where we know we want the first two fields but don’t know how many other fields there are:

# 3a
n <- count.fields("data.txt")[1]
read.table("data.txt", header = TRUE, colClasses = c(NA, NA, rep("NULL", n-2)))

The sqldf package can be used. In this example we assume a csv file, data.csv and that the desired fields are called a and b . If its not a csv file then use appropriate arguments to read.csv.sql to specify other separator, etc. :

# 4
library(sqldf)
DF <- read.csv.sql("data.csv", sql = "select a, b from file")

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I use R for most of my statistical analysis. However, cleaning/processing data, especially when

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply