I have a CVS file with over 1 Million rows of data. I am

Question

0

Asked: June 9, 20262026-06-09T13:47:54+00:00 2026-06-09T13:47:54+00:00

I have a CVS file with over 1 Million rows of data. I am

0

I have a CVS file with over 1 Million rows of data. I am planning to read them in parallel to improve efficiency. Can I do something like the following or is there a more efficient method?

namespace ParallelData
{
public partial class ParallelData : Form
{
    public ParallelData()
    {
        InitializeComponent();
    }

    private static readonly char[] Separators = { ',', ' ' };

    private static void ProcessFile()
    {
        var lines = File.ReadLines("BigData.csv");
        var numbers = ProcessRawNumbers(lines);

        var rowTotal = new List<double>();
        var totalElements = 0;

        foreach (var values in numbers)
        {
            var sumOfRow = values.Sum();
            rowTotal.Add(sumOfRow);
            totalElements += values.Count;
        }
        MessageBox.Show(totalElements.ToString());
    }

    private static List<List<double>> ProcessRawNumbers(IEnumerable<string> lines)
    {
        var numbers = new List<List<double>>();
        /*System.Threading.Tasks.*/
        Parallel.ForEach(lines, line =>
        {
            lock (numbers)
            {
                numbers.Add(ProcessLine(line));
            }
        });
        return numbers;
    }

    private static List<double> ProcessLine(string line)
    {
        var list = new List<double>();
        foreach (var s in line.Split(Separators, StringSplitOptions.RemoveEmptyEntries))
        {
            double i;
            if (Double.TryParse(s, out i))
            {
                list.Add(i);
            }
        }
        return list;
    }

    private void button2_Click(object sender, EventArgs e)
    {
        ProcessFile();
    }
}
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T13:47:56+00:00

I’m not sure it’s a good idea. Depending on your hardware, the CPU won’t be a bottleneck, the disk read speed will.

Another point: if your storage hardware is a magnetic hard disk, then then disk read speed is strongly related to how the file is physically stored in the disk; if the file is not fragmented (i.e. all file chunks are sequentially stored on the disk), you’ll have better performances if you read line by line sequentially.

One solution would be to read the whole file in one time (if you have enough memory space, for 1 million row it should be OK) using File.ReadAllLines, store all lines in a string array, then process (i.e. parse using string.Split…etc.) in your Parallel.Foreach, if the rows order is not important.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a CVS file with over 1 Million rows of data. I am

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply