I am doing text processing and using ‘readline()’ function as follows: ifd = open(…)

Question

0

Asked: June 14, 20262026-06-14T14:54:23+00:00 2026-06-14T14:54:23+00:00

I am doing text processing and using ‘readline()’ function as follows: ifd = open(…)

0

I am doing text processing and using ‘readline()’ function as follows:

ifd = open(...)
for line in ifd:
   while (condition)
         do something...
         line = ifd.readline()
         condition = ....

#Here when the condition becomes false I need to rewind the pointer so that the ‘for’ loop read the same line again.

ifd.fseek() followed by readline is giving me a ‘\n’ character. How to rewind the pointer so that the whole line is read again.

>>> ifd.seek(-1,1)
>>> line = ifd.readline()
>>> line
'\n'

Here is my code

labtestnames = sorted(tmp)
#Now read each line in the inFile and write into outFile
ifd = open(inFile, "r")
ofd = open(outFile, "w")
#read the header
header = ifd.readline() #Do nothing with this line. Skip
#Write header into the output file
nl = "mrn\tspecimen_id\tlab_number\tlogin_dt\tfluid"
offset = len(nl.split("\t"))
nl = nl + "\t" + "\t".join(labtestnames)
ofd.write(nl+"\n")
lenFields = len(nl.split("\t"))


print "Reading the input file and converting into modified file for further processing (correlation analysis etc..)"

prevTup = (0,0,0)
rowComplete = 0
k=0
for line in ifd:
    k=k+1
    if (k==200): break

    items = line.rstrip("\n").split("\t")
    if((items[0] =='')):
        continue
    newline= list('' for i in range(lenFields))
    newline[0],newline[1],newline[3],newline[2],newline[4] = items[0], items[1], items[3], items[2], items[4]
    ltests = []
    ltvals = []
    while(cmp(prevTup, (items[0], items[1], items[3])) == 0): # If the same mrn, lab_number and specimen_id then fill the same row. else create a new row.
        ltests.append(items[6])
        ltvals.append(items[7])
        pos = ifd.tell()
        line = ifd.readline()
        prevTup = (items[0], items[1], items[3])
        items = line.rstrip("\n").split("\t")
        rowComplete = 1

    if (rowComplete == 1): #If the row is completed, prepare newline and write into outfile
        indices = [labtestnames.index(x) for x in ltests]
        j=0
        ifd.seek(pos)
        for i in indices:
            newline[i+offset] = ltvals[j]
            j=j+1

    if (rowComplete == 0): # 
        currTup = (items[0], items[1], items[3])
        ltests = items[6]
        ltvals = items[7]
        pos = ifd.tell()
        line = ifd.readline()
        items = line.rstrip("\n").split("\t")
        newTup = (items[0], items[1], items[3])
        if(cmp(currTup, newTup) == 0):
            prevTup = currTup
            ifd.seek(pos)
            continue
        else:
            indices = labtestnames.index(ltests)
            newline[indices+offset] = ltvals

    ofd.write(newline+"\n")

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T14:54:24+00:00

The problem can be handled more simply using itertools.groupby. groupby can cluster all the contiguous lines that deal with the same mrn, specimen_id, and lab_num.

The code that does this is

for key, group in IT.groupby(reader, key = mykey):

where reader iterates over the lines of the input file, and mykey is defined by

def mykey(row):
    return (row['mrn'], row['specimen_id'], row['lab_num'])

Each row from reader is passed to mykey, and all rows with the same key are clustered together in the same group.

While we’re at it, we might as well use the csv module to read each line into a dict (which I call row). This frees us from having to deal with low-level string manipulation like line.rstrip("\n").split("\t") and instead of referring to columns by index numbers (e.g. row[3]) we can write code that speaks in higher-level terms such as row['lab_num'].

import itertools as IT
import csv

inFile = 'curious.dat'
outFile = 'curious.out'

def mykey(row):
    return (row['mrn'], row['specimen_id'], row['lab_num'])

fieldnames = 'mrn specimen_id date    lab_num Bilirubin   Lipase  Calcium Magnesium   Phosphate'.split()

with open(inFile, 'rb') as ifd:
    reader = csv.DictReader(ifd, delimiter = '\t')
    with open(outFile, 'wb') as ofd:
        writer = csv.DictWriter(
            ofd, fieldnames, delimiter = '\t', lineterminator = '\n', )
        writer.writeheader()
        for key, group in IT.groupby(reader, key = mykey):
            new = {}
            row = next(group)
            for key in ('mrn', 'specimen_id', 'date', 'lab_num'):
                new[key] = row[key]
                new[row['labtest']] = row['result_val']                
            for row in group:
                new[row['labtest']] = row['result_val']
            writer.writerow(new)

yields

mrn specimen_id date    lab_num Bilirubin   Lipase  Calcium Magnesium   Phosphate
4419529 1614487 26.2675 5802791G    0.1             
3319529 1614487 26.2675 5802791G    0.3 153 8.1 2.1 4
5713871 682571  56.0779 9732266E                    4.1

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am doing text processing and using ‘readline()’ function as follows: ifd = open(…)

Here is my code

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply