Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6727867
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T10:06:04+00:00 2026-05-26T10:06:04+00:00

I use R for most of my statistical analysis. However, cleaning/processing data, especially when

  • 0

I use R for most of my statistical analysis. However, cleaning/processing data, especially when dealing with sizes of 1Gb+, is quite cumbersome. So I use common UNIX tools for that. But my question is, is it possible to, say, run them interactively in the middle of an R session? An example: Let’s say file1 is the output dataset from an R processes, with 100 rows. From this, for my next R process, I need a specific subset of columns 1 and 2, file2, which can be easily extracted through cut and awk. So the workflow is something like:

Some R process => file1
cut --fields=1,2 <file1 | awk something something >file2
Next R process using file2

Apologies in advance if this is a foolish question.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T10:06:05+00:00Added an answer on May 26, 2026 at 10:06 am

    Try this (adding other read.table arguments if needed):

    # 1
    DF <- read.table(pipe("cut -fields=1,2 < data.txt| awk something_else"))
    

    or in pure R:

    # 2
    DF <- read.table("data.txt")[1:2]
    

    or to not even read the unwanted fields assuming there are 4 fields:

    # 3
    DF <- read.table("data.txt", colClasses = c(NA, NA, "NULL", "NULL"))
    

    The last line could be modified for the case where we know we want the first two fields but don’t know how many other fields there are:

    # 3a
    n <- count.fields("data.txt")[1]
    read.table("data.txt", header = TRUE, colClasses = c(NA, NA, rep("NULL", n-2)))
    

    The sqldf package can be used. In this example we assume a csv file, data.csv and that the desired fields are called a and b . If its not a csv file then use appropriate arguments to read.csv.sql to specify other separator, etc. :

    # 4
    library(sqldf)
    DF <- read.csv.sql("data.csv", sql = "select a, b from file")
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I use SQL Server to locally store data for statistical analysis. I create my
I'm looking for a basic software for statistical analysis. Most important is simple and
I use BackgroundWorker most of the time in the win form apps to show
Everyone in my office uses Macs and therefore most use Safari. We have a
Do most people use .NET's SqlMembershipProvider, SqlRoleProvider, and SqlProfileProvider when developing a site with
Most websites use - (like Stack Overflow) but most PHP frameworks generate + encoded
Most designers use 1024x768 as a baseline for website development. That allows them to
I currently use Notepad++ for most of my development. I have been checking out
When I use VIM or most modeless editors (Eclipse, NetBeans etc.) I frequently do
Ease of installation/use is the most important factor here - not performance. Small is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.