Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6468871
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T05:55:49+00:00 2026-05-25T05:55:49+00:00

I have files with the following format: ATOM 8962 CA VAL W 8 8.647

  • 0

I have files with the following format:

ATOM   8962  CA  VAL W   8       8.647  81.467  25.656  1.00115.78           C  
ATOM   8963  C   VAL W   8      10.053  80.963  25.506  1.00114.60           C  
ATOM   8964  O   VAL W   8      10.636  80.422  26.442  1.00114.53           O  
ATOM   8965  CB  VAL W   8       7.643  80.389  25.325  1.00115.67           C  
ATOM   8966  CG1 VAL W   8       6.476  80.508  26.249  1.00115.54           C  
ATOM   8967  CG2 VAL W   8       7.174  80.526  23.886  1.00115.26           C  
ATOM   4440  O   TYR S  89       4.530 166.005 -14.543  1.00 95.76           O  
ATOM   4441  CB  TYR S  89       2.847 168.812 -13.864  1.00 96.31           C  
ATOM   4442  CG  TYR S  89       3.887 169.413 -14.756  1.00 98.43           C  
ATOM   4443  CD1 TYR S  89       3.515 170.073 -15.932  1.00100.05           C  
ATOM   4444  CD2 TYR S  89       5.251 169.308 -14.451  1.00100.50           C  
ATOM   4445  CE1 TYR S  89       4.464 170.642 -16.779  1.00100.70           C  
ATOM   4446  CE2 TYR S  89       6.219 169.868 -15.298  1.00101.40           C  
ATOM   4447  CZ  TYR S  89       5.811 170.535 -16.464  1.00100.46           C  
ATOM   4448  OH  TYR S  89       6.736 171.094 -17.321  1.00100.20           O  
ATOM   4449  N   LEU S  90       3.944 166.393 -12.414  1.00 94.95           N  
ATOM   4450  CA  LEU S  90       5.079 165.622 -11.914  1.00 94.44           C  
ATOM   5151  N   LEU W   8     -66.068 209.785 -11.037  1.00117.44           N  
ATOM   5152  CA  LEU W   8     -64.800 210.035 -10.384  1.00116.52           C  
ATOM   5153  C   LEU W   8     -64.177 208.641 -10.198  1.00116.71           C  
ATOM   5154  O   LEU W   8     -64.513 207.944  -9.241  1.00116.99           O  
ATOM   5155  CB  LEU W   8     -65.086 210.682  -9.033  1.00115.76           C  
ATOM   5156  CG  LEU W   8     -64.274 211.829  -8.478  1.00113.89           C  
ATOM   5157  CD1 LEU W   8     -64.528 211.857  -7.006  1.00111.94           C  
ATOM   5158  CD2 LEU W   8     -62.828 211.612  -8.739  1.00112.96           C  

In principle, column 5 (W, in this case, which represents the chain ID) should be identical only in consecutive chunks. However, in files with too many chains, there are no enough letters of the alphabet to assign a single ID per chain and therefore duplicity may occur.

I would like to be able to check whether or not this is the case. In other words I would like to know if a given chain ID (A-Z, always in the 5th column) is present in non-consecutive chunks. I do not mind if it changes from W to S, I would like to know if there are two chunks sharing the same chain ID. In this case, if W or S reappear at some point. In fact, this is only a problem if they also share the first and the 6th columns, but I do not want to complicate things too much.

I do not want to print the lines, just to know the name of the file in which the issue occurs and the chain ID (in this case W), in order to solve the problem. In fact, I already know how to solve the problem, but I need to identify the problematic files to focus on those ones and not repairing already sane files.

SOLUTION (thanks to all for your help and namely to sehe):

for pdb in $(ls *.pdb) ; do
hit=$(awk -v pdb="$pdb" '{ if ( $1 == "ATOM" ) { print $0 } }' $pdb | cut -c22-23 | uniq | sort | uniq -dc)
[ "$hit" ] && echo $pdb = $hit
done
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T05:55:50+00:00Added an answer on May 25, 2026 at 5:55 am

    For this particular sample:

    cut -c22-23 t | uniq | sort | uniq -dc
    

    Will output

    2 W
    

    (the 22nd column contains 2 runs of the letter ‘W’)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have files with the following format: ATOM 3736 CB THR A 486 -6.552
I have a XML File in the following format: <?xml version='1.0' encoding='UTF-8'?> <entry xmlns='http://www.w3.org/2005/Atom'
I have files in S3 bucket, and their names have the following format: username#file_id#...
MetaMap files have following lines: mappings([map(-1000,[ev(-1000,'C0018017','Objective','Goals',[objective],[inpr],[[[1,1],[1,1],0]],yes,no)])]). The format is explained as mappings( [map(negated overall
I have log files that contain SQL statements in the following format: exec sp_executeSQL
So I have a series of files with the following format Their real file
I have two text files in the following format: The first is this on
I have some text files with the following format: 000423|东阿阿胶| 300|1|0.15000| | 000425|徐工机械| 600|1|0.15000|
I have thousands of files that I need to rename with the following format.
I have urls of the following format (I write it as as aregex): http://exampledomain\.com/files/.+

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.