Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4055438
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T14:42:04+00:00 2026-05-20T14:42:04+00:00

I need to read lines from a text file but, where the ‘end of

  • 0

I need to read lines from a text file but, where the ‘end of line’ caracter is not always \n or \x or a combination and may be any combination of characters like ‘xyz’ or ‘|’, but the ‘end of line’ is always the same and known for each type of file.

As the text file may be a big one and I have to keep performances and memory usage in mind what seems to be the best solution ?
Today I use a combinaison of string.read(1000) and split(myendofline) or partition(myendofline) but I would know if a more elegant and standard solution exists.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T14:42:05+00:00Added an answer on May 20, 2026 at 2:42 pm

    Here’s a generator function thats acts as an iterator on a file, cuting the lines according exotic newline being identical in all the file.

    It reads the file by chunks of lenchunk characters and displays the lines in each current chunk, chunk after chunk.

    Since the newline is 3 characters in my exemple (‘:;:’), it may happen that a chunk ends with a cut newline: this generator function takes care of this possibility and manages to display the correct lines.

    In case of a newline being only one character, the function could be simplified. I wrote only the function for the most delicate case.

    Employing this function allows to read a file one line at a time, without reading the entire file into memory.

    from random import randrange, choice
    
    
    # this part is to create an exemple file with newline being :;:
    alphabet = 'abcdefghijklmnopqrstuvwxyz '
    ch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,40)))
                    for i in xrange(50))
    with open('fofo.txt','wb') as g:
        g.write(ch)
    
    
    # this generator function is an iterator for a file
    # if nl receives an argument whose bool is True,
    # the newlines :;: are returned in the lines
    
    def liner(filename,eol,lenchunk,nl=0):
        # nl = 0 or 1 acts as 0 or 1 in splitlines()
        L = len(eol)
        NL = len(eol) if nl else 0
        with open(filename,'rb') as f:
            chunk = f.read(lenchunk)
            tail = ''
            while chunk:
                last = chunk.rfind(eol)
                if last==-1:
                    kept = chunk
                    newtail = ''
                else:
                    kept = chunk[0:last+L]   # here: L
                    newtail = chunk[last+L:] # here: L
                chunk = tail + kept
                tail = newtail
                x = y = 0
                while y+1:
                    y = chunk.find(eol,x)
                    if y+1: yield chunk[x:y+NL] # here: NL
                    else: break
                    x = y+L # here: L
                chunk = f.read(lenchunk)
            yield tail
        
    
    
    for line in liner('fofo.txt',':;:'):
        print line
    

    Here’s the same, with printings here and there to allow to follow the algorithm.

    from random import randrange, choice
    
    
    # this part is to create an exemple file with newline being :;:
    alphabet = 'abcdefghijklmnopqrstuvwxyz '
    ch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,40)))
                    for i in xrange(50))
    with open('fofo.txt','wb') as g:
        g.write(ch)
    
    
    # this generator function is an iterator for a file
    # if nl receives an argument whose bool is True,
    # the newlines :;: are returned in the lines
    
    def liner(filename,eol,lenchunk,nl=0):
        L = len(eol)
        NL = len(eol) if nl else 0
        with open(filename,'rb') as f:
            ch = f.read()
            the_end = '\n\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'+\
                      '\nend of the file=='+ch[-50:]+\
                      '\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n'
            f.seek(0,0)
            chunk = f.read(lenchunk)
            tail = ''
            while chunk:
                if (chunk[-1]==':' and chunk[-3:]!=':;:') or chunk[-2:]==':;':
                    wr = [' ##########---------- cut newline cut ----------##########'+\
                         '\nchunk== '+chunk+\
                         '\n---------------------------------------------------']
                else:
                    wr = ['chunk== '+chunk+\
                         '\n---------------------------------------------------']
                last = chunk.rfind(eol)
                if last==-1:
                    kept = chunk
                    newtail = ''
                else:
                    kept = chunk[0:last+L]   # here: L
                    newtail = chunk[last+L:] # here: L
                wr.append('\nkept== '+kept+\
                          '\n---------------------------------------------------'+\
                          '\nnewtail== '+newtail)
                chunk = tail + kept
                tail = newtail
                wr.append('\n---------------------------------------------------'+\
                          '\ntail + kept== '+chunk+\
                          '\n---------------------------------------------------')
                print ''.join(wr)
                x = y = 0
                while y+1:
                    y = chunk.find(eol,x)
                    if y+1: yield chunk[x:y+NL] # here: NL
                    else: break
                    x = y+L # here: L
                print '\n\n==================================================='
                chunk = f.read(lenchunk)
            yield tail
            print the_end
        
    
    
    for line in liner('fofo.txt',':;:',1):
        print 'line== '+line
    

    .

    EDIT

    I compared the times of execution of my code and of the chmullig’s code.

    With a ‘fofo.txt’ file about 10 MB, created with

    alphabet = 'abcdefghijklmnopqrstuvwxyz '
    ch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,60)))
                    for i in xrange(324000))
    with open('fofo.txt','wb') as g:
        g.write(ch)
    

    and measuring times like that:

    te = clock()
    for line in liner('fofo.txt',':;:', 65536):
        pass
    print clock()-te
    
    
    fh = open('fofo.txt', 'rb')
    zenBreaker = SpecialDelimiters(fh, ':;:', 65536)
    
    te = clock()
    for line in zenBreaker:
        pass
    print clock()-te
    

    I obtained the following minimum times observed on several essays:

    …………my code 0,7067 seconds

    chmullig’s code 0.8373 seconds

    .

    EDIT 2

    I changed my generator function: liner2() takes a file-handler instead of the file’s name. So the opening of the file can be put out of the measuring of time, as it is for the measuring of chmullig’s code

    def liner2(fh,eol,lenchunk,nl=0):
        L = len(eol)
        NL = len(eol) if nl else 0
        chunk = fh.read(lenchunk)
        tail = ''
        while chunk:
            last = chunk.rfind(eol)
            if last==-1:
                kept = chunk
                newtail = ''
            else:
                kept = chunk[0:last+L]   # here: L
                newtail = chunk[last+L:] # here: L
            chunk = tail + kept
            tail = newtail
            x = y = 0
            while y+1:
                y = chunk.find(eol,x)
                if y+1: yield chunk[x:y+NL] # here: NL
                else: break
                x = y+L # here: L
            chunk = fh.read(lenchunk)
        yield tail
    
    fh = open('fofo.txt', 'rb')
    te = clock()
    for line in liner2(fh,':;:', 65536):
        pass
    print clock()-te
    

    The results, after numerous essays to see the minimum times, are

    ………with liner() 0.7067seconds

    …….with liner2() 0.7064 seconds

    chmullig’s code 0.8373 seconds

    In fact the opening of the file counts for an infinitesimal part in the total time.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I read text data from big file line by line. But I need to
I read a line from a text file, where in the end of the
I need to read line by line from text file (log files from server)
I'm trying to have python read some lines of text from a file and
I need to read data from a text file where the field lengths and
I have text file from which I need to get data by line by
I need to read the data from a file that can be either comma
I need to read small sequences of data from a 3.7 GB file. The
I have the following basic code to read a text file from a StreamReader:
I need to match certain things from lines of an input text. The lines

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.