Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1033359
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T14:13:14+00:00 2026-05-16T14:13:14+00:00

I have a 100 GB text file, which is a BCP dump from a

  • 0

I have a 100 GB text file, which is a BCP dump from a database. When I try to import it with BULK INSERT, I get a cryptic error on line number 219506324. Before solving this issue I would like to see this line, but alas my favorite method of

import linecache
print linecache.getline(filename, linenumber)

is throwing a MemoryError. Interestingly the manual says that “This function will never throw an exception.” On this large file it throws one as I try to read line number 1, and I have about 6GB free RAM…

I would like to know what is the most elegant method to get to that unreachable line. Available tools are Python 2, Python 3 and C# 4 (Visual Studio 2010). Yes, I understand that I can always do something like

var line = 0;
using (var stream = new StreamReader(File.OpenRead(@"s:\source\transactions.dat")))
{
     while (++line < 219506324) stream.ReadLine(); //waste some cycles
     Console.WriteLine(stream.ReadLine());
}

Which would work, but I doubt it’s the most elegant way.

EDIT: I’m waiting to close this thread, because the hard drive containing the file is being used right now by another process. I’m going to test both suggested methods and report timings. Thank you all for your suggestions and comments.

The Results are in I implemented Gabes and Alexes methods to see which one was faster. If I’m doing anything wrong, do tell. I’m going for the 10 millionth line in my 100GB file using the method Gabe suggested and then using the method Alex suggested which i loosely translated into C#… The only thing I’m adding from myself, is first reading in a 300 MB file into memory just to clear the HDD cache.

const string file = @"x:\....dat"; // 100 GB file
const string otherFile = @"x:\....dat"; // 300 MB file
const int linenumber = 10000000;

ClearHDDCache(otherFile);
GabeMethod(file, linenumber);  //Gabe's method

ClearHDDCache(otherFile);
AlexMethod(file, linenumber);  //Alex's method

// Results
// Gabe's method: 8290 (ms)
// Alex's method: 13455 (ms)

The implementation of gabe’s method is as follows:

var gabe = new Stopwatch();
gabe.Start();
var data = File.ReadLines(file).ElementAt(linenumber - 1);
gabe.Stop();
Console.WriteLine("Gabe's method: {0} (ms)",  gabe.ElapsedMilliseconds);

While Alex’s method is slightly tricker:

var alex = new Stopwatch();
alex.Start();
const int buffersize = 100 * 1024; //bytes
var buffer = new byte[buffersize];
var counter = 0;
using (var filestream = File.OpenRead(file))
{
    while (true) // Cutting corners here...
    {
        filestream.Read(buffer, 0, buffersize);
        //At this point we could probably launch an async read into the next chunk...
        var linesread = buffer.Count(b => b == 10); //10 is ASCII linebreak.
        if (counter + linesread >= linenumber) break;
        counter += linesread;
    }
}
//The downside of this method is that we have to assume that the line fit into the buffer, or do something clever...er
var data = new ASCIIEncoding().GetString(buffer).Split('\n').ElementAt(linenumber - counter - 1);
alex.Stop();
Console.WriteLine("Alex's method: {0} (ms)", alex.ElapsedMilliseconds);

So unless Alex cares to comment I’ll mark Gabe’s solution as accepted.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T14:13:15+00:00Added an answer on May 16, 2026 at 2:13 pm

    Here’s my elegant version in C#:

    Console.Write(File.ReadLines(@"s:\source\transactions.dat").ElementAt(219506323));
    

    or more general:

    Console.Write(File.ReadLines(filename).ElementAt(linenumber - 1));
    

    Of course, you may want to show some context before and after the given line:

    Console.Write(string.Join("\n",
                  File.ReadLines(filename).Skip(linenumber - 5).Take(10)));
    

    or more fluently:

    File
    .ReadLines(filename)
    .Skip(linenumber - 5)
    .Take(10)
    .AsObservable()
    .Do(Console.WriteLine);
    

    BTW, the linecache module does not do anything clever with large files. It just reads the whole thing in, keeping it all in memory. The only exceptions it catches are I/O-related (can’t access file, file not found, etc.). Here’s the important part of the code:

        fp = open(fullname, 'rU')
        lines = fp.readlines()
        fp.close()
    

    In other words, it’s trying to fit the whole 100GB file into 6GB of RAM! What the manual should say is maybe “This function will never throw an exception if it can’t access the file.”

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a text file which was originally a mysql dump of a database
I have to remove duplicate strings from extremely big text file (100 Gb+) Since
I have a text file with about 100,000 lines (5 MB), which is updated
I have a text file looks like this : 100 50 20 90 4.07498
Let's say I have a text file that is 100 lines long. I want
I have around 100 rows of text that I want to tokenize, which are
I have a text file which has first line as below: j0W82LBrSdUbw Basically it
I'm opening a text file which can hold anywhere between 100 and 50,000 dataFrames,
I have a file of about 100 million lines in which I want to
If I have the following (sample) text file; year,2008,2009,2010 income,1000,1500,2000 dividends,100,200,300 net profit,1100,1700,2300 expenses,500,600,500

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.