Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8432737
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T06:12:33+00:00 2026-06-10T06:12:33+00:00

I have a textfile where columns are separated by variable amounts of whitespace. Is

  • 0

I have a textfile where columns are separated by variable amounts of whitespace. Is it possible to load this file directly as a pandas dataframe without pre-processing the file? In the pandas documentation the delimiter section says that I can use a 's*' construct but I couldn’t get this to work.

## sample data
head sample.txt

#                                                                            --- full sequence --- -------------- this domain -------------   hmm coord   ali coord   env coord
# target name        accession   tlen query name           accession   qlen   E-value  score  bias   #  of  c-Evalue  i-Evalue  score  bias  from    to  from    to  from    to  acc description of target
#------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------
ABC_membrane         PF00664.18   275 AAF67494.2_AF170880  -            615     8e-29  100.7  11.4   1   1     3e-32     1e-28  100.4   7.9     3   273    42   313    40   315 0.95 ABC transporter transmembrane region
ABC_tran             PF00005.22   118 AAF67494.2_AF170880  -            615   2.6e-20   72.8   0.0   1   1   1.9e-23   6.4e-20   71.5   0.0     1   118   402   527   402   527 0.93 ABC transporter
SMC_N                PF02463.14   220 AAF67494.2_AF170880  -            615   3.8e-08   32.7   0.2   1   2    0.0036        12    4.9   0.0    27    40   391   404   383   408 0.86 RecF/RecN/SMC N terminal domain
SMC_N                PF02463.14   220 AAF67494.2_AF170880  -            615   3.8e-08   32.7   0.2   2   2   1.8e-09   6.1e-06   25.4   0.0   116   210   461   568   428   575 0.85 RecF/RecN/SMC N terminal domain
AAA_16               PF13191.1    166 AAF67494.2_AF170880  -            615   3.1e-06   27.5   0.3   1   1     2e-09     7e-06   26.4   0.2    20   158   386   544   376   556 0.72 AAA ATPase domain
YceG                 PF02618.11   297 AAF67495.1_AF170880  -            284   3.4e-64  216.6   0.0   1   1   2.9e-68     4e-64  216.3   0.0    68   296    53   274    29   275 0.85 YceG-like family
Pyr_redox_3          PF13738.1    203 AAF67496.2_AF170880  -            352   2.9e-28   99.1   0.0   1   2   2.8e-30   4.8e-27   95.2   0.0     1   201     4   198     4   200 0.85 Pyridine nucleotide-disulphide oxidoreductase

#load data
from pandas import *
data = read_table('sample.txt', skiprows=3, header=None, sep=" ")

ValueError: Expecting 83 columns, got 91 in row 4

#load data part 2
data = read_table('sample.txt', skiprows=3, header=None, sep="'s*' ")
#this mushes some of the columns into the first column and drops the rest.
    X.1
1    ABC_tran PF00005.22 118 AAF67494.2_
2    SMC_N PF02463.14 220 AAF67494.2_
3    SMC_N PF02463.14 220 AAF67494.2_
4    AAA_16 PF13191.1 166 AAF67494.2_
5    YceG PF02618.11 297 AAF67495.1_
6    Pyr_redox_3 PF13738.1 203 AAF67496.2_
7    Pyr_redox_3 PF13738.1 203 AAF67496.2_
8    FMO-like PF00743.14 532 AAF67496.2_
9    FMO-like PF00743.14 532 AAF67496.2_

While I can preprocess the files to change the whitespace to commas/tabs it would be nice to load them directly.

(FYI this is the *.hmmdomtblout output from the hmmscan program)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T06:12:34+00:00Added an answer on June 10, 2026 at 6:12 am

    I think there’s just a missing \ in the docs (maybe because it was interpreted as an escape marker at some point?) It’s a regexp, after all:

    In [68]: data = read_table('sample.txt', skiprows=3, header=None, sep=r"\s*")
    
    In [69]: data
    Out[69]: 
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 7 entries, 0 to 6
    Data columns:
    X.1     7  non-null values
    X.2     7  non-null values
    X.3     7  non-null values
    X.4     7  non-null values
    X.5     7  non-null values
    X.6     7  non-null values
    [...]
    X.23    7  non-null values
    X.24    7  non-null values
    X.25    5  non-null values
    X.26    3  non-null values
    dtypes: float64(8), int64(10), object(8)
    

    Because of the delimiter problem noted by @MRAB, it has some trouble with the last few columns:

    In [73]: data.ix[:,20:]
    Out[73]: 
       X.21  X.22           X.23                   X.24            X.25    X.26
    0   315  0.95            ABC            transporter   transmembrane  region
    1   527  0.93            ABC            transporter            None    None
    2   408  0.86  RecF/RecN/SMC                      N        terminal  domain
    3   575  0.85  RecF/RecN/SMC                      N        terminal  domain
    4   556  0.72            AAA                 ATPase          domain    None
    5   275  0.85      YceG-like                 family            None    None
    6   200  0.85       Pyridine  nucleotide-disulphide  oxidoreductase    None
    

    but that can be patched up at the end.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Suppose I have a text file with data separated by whitespace into columns. I
I have to extract columns from a text file explained in this post: Extracting
Okay, so I have this utf-8 textfile containing 20 tab-seperated columns of various types
I have a text file (tab separated) which consist of 17 column. I would
I have a text file containing 5 columns of data. The first column contains
I have a text file with tab delimited data spread across 16 columns. I
I have a text file where data is stored as columns. How do I
Python beginner here. I have a text file that is sorted into columns: fields
I have a 2GB big text file, it has 5 columns delimited by tab.
Hi i have my text file in this format **4 1250000209852 01 XXXX XXXX

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.