Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 850007
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T07:11:40+00:00 2026-05-15T07:11:40+00:00

I have a file, called genes.txt , which I’d like to become a data.frame.

  • 0

I have a file, called genes.txt, which I’d like to become a data.frame. It’s got a lot of lines, each line has three, tab delimited fields:

mike$ wc -l genes.txt
   42476 genes.txt

I’d like to read this file into a data.frame in R. I use the command read.table, like this:

genes = read.table(
    genes_file, 
    sep="\t", 
    na.strings="-", 
    fill=TRUE,
    col.names=c("GeneSymbol","synonyms","description")
)

Which seems to work fine, where genes_file points at genes.txt. However, the number of lines in my data.frame is significantly less than the number of lines in my text file:

> nrow(genes)
[1] 27896

and things I can find in the text file:

mike$ grep "SELL" genes.txt 
SELL    CD62L|LAM1|LECAM1|LEU8|LNHR|LSEL|LYAM1|PLNHR|TQ1    selectin L

don’t seem to be in the data.frame

> grep("SELL",genes$GeneSymbol)
integer(0)

it turns out that

genes = read.delim(
    genes_file,
    header=FALSE,
    na.strings="-",
    fill=TRUE,
    col.names=c("GeneSymbol","synonyms","description"),
)

works just fine. Why does read.delim work when read.table not?

If it’s of use, you can recreate genes.txt using the following commands which you should run from a command line

curl -O ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
gzip -cd gene_info.gz | awk -Ft '$1==9606{print $3 "\t" $5 "\t" $9}' > genes.txt

be warned, though, that gene_info.gz is 101MBish.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T07:11:40+00:00Added an answer on May 15, 2026 at 7:11 am

    With read.table one of the default quote characters is the single quote. I’m guessing you have some unmatched single quotes in your description field and all the data between single quotes is being pooled together into one entry.

    With read.delim the defualt quote character is the double quote and thus this isn’t a problem.

    Specify your quote character and you should be all set.

    > genes<-read.table("genes.txt",sep="\t",quote="\"",na.strings="-",fill=TRUE, col.names=c("GeneSymbol","synonyms","description"))
    > nrow(genes)
    [1] 42476
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a file called a.txt which I can open and read using fs.readFile.
I have a file called inp.txt which lists 3 directory names #!/bin/sh while read
So I have a file called WebParts.aspx which looks like this - <%@ Page
I have a file called something like FILE-1.txt or FILE-340.txt. I want to be
I have a file called installer.txt which will contain a line by line of
I have a file called pids.txt that contains lines, such as the following: 123
I have a file called one_to_many.txt. In the file is the data: a,aaa b,bbb
I have a file called membercodes.cfg. I want to read each line (lines are
I have a file called a.txt. with values like 1 2 3 ... I
I have a file called filecontent.txt which includes some php code. And i have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.