Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 122981
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T04:12:52+00:00 2026-05-11T04:12:52+00:00

I need to process a large file, around 400K lines and 200 M. But

  • 0

I need to process a large file, around 400K lines and 200 M. But sometimes I have to process from bottom up. How can I use iterator (yield return) here? Basically I don’t like to load everything in memory. I know it is more efficient to use iterator in .NET.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T04:12:53+00:00Added an answer on May 11, 2026 at 4:12 am

    Reading text files backwards is really tricky unless you’re using a fixed-size encoding (e.g. ASCII). When you’ve got variable-size encoding (such as UTF-8) you will keep having to check whether you’re in the middle of a character or not when you fetch data.

    There’s nothing built into the framework, and I suspect you’d have to do separate hard coding for each variable-width encoding.

    EDIT: This has been somewhat tested – but that’s not to say it doesn’t still have some subtle bugs around. It uses StreamUtil from MiscUtil, but I’ve included just the necessary (new) method from there at the bottom. Oh, and it needs refactoring – there’s one pretty hefty method, as you’ll see:

    using System; using System.Collections; using System.Collections.Generic; using System.IO; using System.Text;  namespace MiscUtil.IO {     /// <summary>     /// Takes an encoding (defaulting to UTF-8) and a function which produces a seekable stream     /// (or a filename for convenience) and yields lines from the end of the stream backwards.     /// Only single byte encodings, and UTF-8 and Unicode, are supported. The stream     /// returned by the function must be seekable.     /// </summary>     public sealed class ReverseLineReader : IEnumerable<string>     {         /// <summary>         /// Buffer size to use by default. Classes with internal access can specify         /// a different buffer size - this is useful for testing.         /// </summary>         private const int DefaultBufferSize = 4096;          /// <summary>         /// Means of creating a Stream to read from.         /// </summary>         private readonly Func<Stream> streamSource;          /// <summary>         /// Encoding to use when converting bytes to text         /// </summary>         private readonly Encoding encoding;          /// <summary>         /// Size of buffer (in bytes) to read each time we read from the         /// stream. This must be at least as big as the maximum number of         /// bytes for a single character.         /// </summary>         private readonly int bufferSize;          /// <summary>         /// Function which, when given a position within a file and a byte, states whether         /// or not the byte represents the start of a character.         /// </summary>         private Func<long,byte,bool> characterStartDetector;          /// <summary>         /// Creates a LineReader from a stream source. The delegate is only         /// called when the enumerator is fetched. UTF-8 is used to decode         /// the stream into text.         /// </summary>         /// <param name='streamSource'>Data source</param>         public ReverseLineReader(Func<Stream> streamSource)             : this(streamSource, Encoding.UTF8)         {         }          /// <summary>         /// Creates a LineReader from a filename. The file is only opened         /// (or even checked for existence) when the enumerator is fetched.         /// UTF8 is used to decode the file into text.         /// </summary>         /// <param name='filename'>File to read from</param>         public ReverseLineReader(string filename)             : this(filename, Encoding.UTF8)         {         }          /// <summary>         /// Creates a LineReader from a filename. The file is only opened         /// (or even checked for existence) when the enumerator is fetched.         /// </summary>         /// <param name='filename'>File to read from</param>         /// <param name='encoding'>Encoding to use to decode the file into text</param>         public ReverseLineReader(string filename, Encoding encoding)             : this(() => File.OpenRead(filename), encoding)         {         }          /// <summary>         /// Creates a LineReader from a stream source. The delegate is only         /// called when the enumerator is fetched.         /// </summary>         /// <param name='streamSource'>Data source</param>         /// <param name='encoding'>Encoding to use to decode the stream into text</param>         public ReverseLineReader(Func<Stream> streamSource, Encoding encoding)             : this(streamSource, encoding, DefaultBufferSize)         {         }          internal ReverseLineReader(Func<Stream> streamSource, Encoding encoding, int bufferSize)         {             this.streamSource = streamSource;             this.encoding = encoding;             this.bufferSize = bufferSize;             if (encoding.IsSingleByte)             {                 // For a single byte encoding, every byte is the start (and end) of a character                 characterStartDetector = (pos, data) => true;             }             else if (encoding is UnicodeEncoding)             {                 // For UTF-16, even-numbered positions are the start of a character.                 // TODO: This assumes no surrogate pairs. More work required                 // to handle that.                 characterStartDetector = (pos, data) => (pos & 1) == 0;             }             else if (encoding is UTF8Encoding)             {                 // For UTF-8, bytes with the top bit clear or the second bit set are the start of a character                 // See http://www.cl.cam.ac.uk/~mgk25/unicode.html                 characterStartDetector = (pos, data) => (data & 0x80) == 0 || (data & 0x40) != 0;             }             else             {                 throw new ArgumentException('Only single byte, UTF-8 and Unicode encodings are permitted');             }         }          /// <summary>         /// Returns the enumerator reading strings backwards. If this method discovers that         /// the returned stream is either unreadable or unseekable, a NotSupportedException is thrown.         /// </summary>         public IEnumerator<string> GetEnumerator()         {             Stream stream = streamSource();             if (!stream.CanSeek)             {                 stream.Dispose();                 throw new NotSupportedException('Unable to seek within stream');             }             if (!stream.CanRead)             {                 stream.Dispose();                 throw new NotSupportedException('Unable to read within stream');             }             return GetEnumeratorImpl(stream);         }          private IEnumerator<string> GetEnumeratorImpl(Stream stream)         {             try             {                 long position = stream.Length;                  if (encoding is UnicodeEncoding && (position & 1) != 0)                 {                     throw new InvalidDataException('UTF-16 encoding provided, but stream has odd length.');                 }                  // Allow up to two bytes for data from the start of the previous                 // read which didn't quite make it as full characters                 byte[] buffer = new byte[bufferSize + 2];                 char[] charBuffer = new char[encoding.GetMaxCharCount(buffer.Length)];                 int leftOverData = 0;                 String previousEnd = null;                 // TextReader doesn't return an empty string if there's line break at the end                 // of the data. Therefore we don't return an empty string if it's our *first*                 // return.                 bool firstYield = true;                  // A line-feed at the start of the previous buffer means we need to swallow                 // the carriage-return at the end of this buffer - hence this needs declaring                 // way up here!                 bool swallowCarriageReturn = false;                  while (position > 0)                 {                     int bytesToRead = Math.Min(position > int.MaxValue ? bufferSize : (int)position, bufferSize);                      position -= bytesToRead;                     stream.Position = position;                     StreamUtil.ReadExactly(stream, buffer, bytesToRead);                     // If we haven't read a full buffer, but we had bytes left                     // over from before, copy them to the end of the buffer                     if (leftOverData > 0 && bytesToRead != bufferSize)                     {                         // Buffer.BlockCopy doesn't document its behaviour with respect                         // to overlapping data: we *might* just have read 7 bytes instead of                         // 8, and have two bytes to copy...                         Array.Copy(buffer, bufferSize, buffer, bytesToRead, leftOverData);                     }                     // We've now *effectively* read this much data.                     bytesToRead += leftOverData;                      int firstCharPosition = 0;                     while (!characterStartDetector(position + firstCharPosition, buffer[firstCharPosition]))                     {                         firstCharPosition++;                         // Bad UTF-8 sequences could trigger this. For UTF-8 we should always                         // see a valid character start in every 3 bytes, and if this is the start of the file                         // so we've done a short read, we should have the character start                         // somewhere in the usable buffer.                         if (firstCharPosition == 3 || firstCharPosition == bytesToRead)                         {                             throw new InvalidDataException('Invalid UTF-8 data');                         }                     }                     leftOverData = firstCharPosition;                      int charsRead = encoding.GetChars(buffer, firstCharPosition, bytesToRead - firstCharPosition, charBuffer, 0);                     int endExclusive = charsRead;                      for (int i = charsRead - 1; i >= 0; i--)                     {                         char lookingAt = charBuffer[i];                         if (swallowCarriageReturn)                         {                             swallowCarriageReturn = false;                             if (lookingAt == '\r')                             {                                 endExclusive--;                                 continue;                             }                         }                         // Anything non-line-breaking, just keep looking backwards                         if (lookingAt != '\n' && lookingAt != '\r')                         {                             continue;                         }                         // End of CRLF? Swallow the preceding CR                         if (lookingAt == '\n')                         {                             swallowCarriageReturn = true;                         }                         int start = i + 1;                         string bufferContents = new string(charBuffer, start, endExclusive - start);                         endExclusive = i;                         string stringToYield = previousEnd == null ? bufferContents : bufferContents + previousEnd;                         if (!firstYield || stringToYield.Length != 0)                         {                             yield return stringToYield;                         }                         firstYield = false;                         previousEnd = null;                     }                      previousEnd = endExclusive == 0 ? null : (new string(charBuffer, 0, endExclusive) + previousEnd);                      // If we didn't decode the start of the array, put it at the end for next time                     if (leftOverData != 0)                     {                         Buffer.BlockCopy(buffer, 0, buffer, bufferSize, leftOverData);                     }                 }                 if (leftOverData != 0)                 {                     // At the start of the final buffer, we had the end of another character.                     throw new InvalidDataException('Invalid UTF-8 data at start of stream');                 }                 if (firstYield && string.IsNullOrEmpty(previousEnd))                 {                     yield break;                 }                 yield return previousEnd ?? '';             }             finally             {                 stream.Dispose();             }         }          IEnumerator IEnumerable.GetEnumerator()         {             return GetEnumerator();         }     } }   // StreamUtil.cs: public static class StreamUtil {     public static void ReadExactly(Stream input, byte[] buffer, int bytesToRead)     {         int index = 0;         while (index < bytesToRead)         {             int read = input.Read(buffer, index, bytesToRead - index);             if (read == 0)             {                 throw new EndOfStreamException                     (String.Format('End of stream reached with {0} byte{1} left to read.',                                    bytesToRead - index,                                    bytesToRead - index == 1 ? 's' : ''));             }             index += read;         }     } } 

    Feedback very welcome. This was fun 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a quite big XML output from an application. I need to process
I have a large subset of encrypted word documents which i need to process
I need to get rather complicate JSON data from (large) JSON file into a
I need to do some process injection using C++ but I would prefer to
I need to set my process to run under 'nobody', I've found os.setuid(), but
Using C++ (and Qt), I need to process a big amount of 3D coordinates.
Parts of my application are in C++ under windows. I need the process id
I am launching a child process with ProcessBuilder, and need the child process to
I'm writing an automation script with autoit: http://www.autoitscript.com/autoit3/index.shtml . In the process I need
I need to automate a process involving a website that is using a login

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.