Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 872581
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T10:48:16+00:00 2026-05-15T10:48:16+00:00

I need to extract certain data from a file, but this file is formatted

  • 0

I need to extract certain data from a file, but this file is formatted to be read by humans, and is therefore irregular. First off there is a large amount of text before any of the data actually begins:

   DL_POLY Version 2.20

                        Running on   10 nodes



*************** DLPOLY: LiNbO3 >***************




SIMULATION CONTROL PARAMETERS

simulation temperature 1.4500E+03

simulation pressure (katm) 0.0000E+00

selected number of timesteps 8000

equilibration period 500

data printing interval 80

statistics file interval 80

simulation timestep 5.0000E-04

Nose-Hoover (Melchionna) isotropic N-P-T
thermostat relaxation time 1.0000E-01
barostat relaxation time 5.0000E-01

trajectory file option on
trajectory file start 1
trajectory file interval 80
trajectory file info key 2
…

Then after a while there is the actual data but it is in this funny form:


step eng_tot temp_tot eng_cfg eng_vdw eng_cou eng_bnd > eng_ang eng_dih eng_tet
time(ps) eng_pv temp_rot vir_cfg vir_vdw vir_cou vir_bnd >vir_ang vir_con vir_tet
cpu (s) volume temp_shl eng_shl vir_shl alpha beta >gamma vir_pmf press


1 -1.1289E+05 1.4750E+03 -1.1386E+05 1.7276E+04 -1.3114E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
0.0 -1.1545E+05 0.0000E+00 9.6539E+03 -1.2118E+05 1.3083E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
0.8 5.3733E+04 1.2367E+02 0.0000E+00 0.0000E+00 5.6396E+01 5.6396E+01 >5.6396E+01 0.0000E+00 -7.5549E+01

rolling -1.1289E+05 1.4750E+03 -1.1386E+05 1.7276E+04 -1.3114E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
averages -1.1545E+05 0.0000E+00 9.6539E+03 -1.2118E+05 1.3083E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
5.3733E+04 1.2367E+02 0.0000E+00 0.0000E+00 5.6396E+01 5.6396E+01 >5.6396E+01 0.0000E+00 -7.5549E+01


80 -1.1290E+05 1.5021E+03 -1.1392E+05 2.1894E+04 -1.3726E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
0.0 -1.1256E+05 0.0000E+00 8.6671E+02 -1.3974E+05 1.3707E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
10.6 5.3149E+04 1.1377E+03 1.4419E+03 3.5382E+03 5.6396E+01 5.6396E+01 >5.6396E+01 0.0000E+00 1.1119E+01

rolling -1.1290E+05 1.6145E+03 -1.1398E+05 2.0750E+04 -1.3588E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
averages -1.1333E+05 0.0000E+00 3.3694E+03 -1.3512E+05 1.3565E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
5.3481E+04 1.0997E+03 1.1430E+03 2.8391E+03 5.6396E+01 5.6396E+01 >5.6396E+01 0.0000E+00 -1.2096E+01


160 -1.1287E+05 1.2629E+03 -1.1376E+05 2.1450E+04 -1.3633E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
0.1 -1.1249E+05 0.0000E+00 3.8761E+02 -1.3824E+05 1.3612E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
20.5 5.3375E+04 4.9015E+02 1.1243E+03 2.5052E+03 5.6396E+01 5.6396E+01 >5.6396E+01 0.0000E+00 1.2676E+01

rolling -1.1288E+05 1.4677E+03 -1.1389E+05 2.1589E+04 -1.3663E+05 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
averages -1.1235E+05 0.0000E+00 2.1147E+02 -1.3884E+05 1.3643E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
5.3152E+04 7.4818E+02 1.1440E+03 2.6211E+03 5.6396E+01 5.6396E+01 >5.6396E+01 0.0000E+00 1.7174E+01


On the 9th data interval there is a slight anamoly:


switching off temperature scaling at step 500


 560 -1.1287E+05  1.4709E+03 -1.1390E+05  2.1600E+04 -1.3678E+05  0.0000E+00  >0.0000E+00  0.0000E+00  0.0000E+00
 0.3 -1.1292E+05  0.0000E+00  1.9253E+03 -1.3743E+05  1.3656E+05  0.0000E+00  >0.0000E+00  0.0000E+00  0.0000E+00
68.4  5.4300E+04  1.5043E+02  1.2775E+03  2.7947E+03  5.6396E+01  5.6396E+01  >5.6396E+01  0.0000E+00  2.0576E-01

rolling -1.1286E+05 1.4784E+03 -1.1390E+05 2.1546E+04 -1.3673E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
averages -1.1298E+05 0.0000E+00 2.1361E+03 -1.3717E+05 1.3651E+05 0.0000E+00 >0.0000E+00 0.0000E+00 0.0000E+00
5.4303E+04 2.2261E+02 1.2785E+03 2.8027E+03 5.6396E+01 5.6396E+01 >5.6396E+01 0.0000E+00 -1.7421E+00



As you can see there is a pair of ‘—-‘ lines which may interfere with proper parsing of the data.

Lets say I want to get just ‘the eng_tot’ data from this file (the bolded numbers), how would I go about doing that in Python? The number is always in the same place in the file (second quantity, first row after second set of —-s.

By the way the header part with all the definitions in it repeats every 8 steps, execpt the first step in which there are 9 lines. I’d like to just ignore the first step. For now lets say I want to start with line 295 inclusive. Just so you know, I’m quite new to python and programming in general so all the help you can provide is appreciated.

Here’s the code I tried, but Eng_Total is still an empty set:

import re
import inspect

def lineno():
    """Returns the current line number"""
    linenum = inspect.currentframe().f_back.f_lineno
infile =  open('FilePath/OUTPUT.01').read()
Eng_Total = []
for line in infile:
#    if 'eng_tot' in line.split(): 
     if re.match("\s+-+\s+", line):
    lineno(line)
        line = linenum+1
        sanitized_line = line[8:]
        eng_total = line.split()[0]
        Eng_Total.append(eng_total)
print Eng_Total
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T10:48:17+00:00Added an answer on May 15, 2026 at 10:48 am

    I’d probably do this:

    • iterate over lines in the output
    • search for one containing eng_tot:
      • if 'eng_tot' in line.split(): process_blocks
    • gobble up lines until one matches all dashes (with optional spaces on either side)
      • if re.match("\s+-+\s+", line): proccess_metrics_block
    • process the first line of metrics:
      • cut the first column off the line (it makes it harder to parse, because it might not be there)
        • sanitized_line = line[8:]
        • eng_total = line.split()[0] , the first column is now eng_total
    • skip lines until you reach another line of dashes, then start again

    After seeing your edits:

    • You need to import the re (regular expression) module, at the top of the file : import re
    • The process_blocks and process_metrics_block were pseudo code. Those don’t exist unless you define them. 🙂 You don’t need those functions exactly, you can avoid them using basic looping (while) and conditional (if) statements.
    • You’ll have to make sure you understand what you’re doing, not just copy from stack overflow! 🙂

    It looks like you’re trying to do something like this. It seems to work, but I’m sure with some effort, you can come up with something nicer:

    import re
    
    def find_header(lines):
      for (i, line) in enumerate(lines):
        if 'eng_tot' in line.split():
          return i
      return None
    
    def find_next_separator(lines, start):
      for (i, line) in enumerate(lines[start+1:]):
        if re.match("\s*-+\s*", line):
          return i + start + 1
      return None
    
    if __name__ == '__main__':
      totals = []
      lines = open('so.txt').readlines()
    
      header = find_header(lines)
      start = find_next_separator(lines, header+1)
    
      while True:
        end = find_next_separator(lines, start+1)
        if end is None: break
    
        # Pull out block, after line of dashes.
        metrics_block = lines[start+1:end]
    
        # Pull out 2nd column from 1st line of metrics.
        eng_total = metrics_block[0].split()[1]
        totals.append(eng_total)
    
        start = end
    
      print totals
    

    You can use a generator to be a little more pythonic:

    def metric_block_iter(lines):
      start = find_next_separator(lines, find_header(lines)+1)
      while True:
        end = find_next_separator(lines, start+1)
        if end is None: break
        yield (start, end)
        start = end
    
    
    if __name__ == '__main__':
      totals = []
      lines = open('so.txt').readlines()
    
      for (start, end) in metric_block_iter(lines):
        # Pull out block, after line of dashes.
        metrics_block = lines[start+1:end]
    
        # Pull out 2nd column from 1st line of metrics.
        eng_total = metrics_block[0].split()[1]
        totals.append(eng_total)
    
      print totals
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.