Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7542095
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T08:04:22+00:00 2026-05-30T08:04:22+00:00

Overview of My Situation: My task is to read strings from a file, and

  • 0

Overview of My Situation:

My task is to read strings from a file, and re-format them to a more useful format. After reformatting the input, I have to write it to a output file.

Here is an Example of what has to be done.
Example of File Line :

ANO=2010;CPF=17834368168;YEARS=2010;2009;2008;2007;2006 <?xml version='1.0' encoding='ISO-8859-1'?><QUERY><RESTITUICAO><CPF>17834368168</CPF><ANO>2010</ANO><SITUACAODECLARACAO>Sua declaração não consta na base de dados da Receita Federal</SITUACAODECLARACAO><DATACONSULTA>05/01/2012</DATACONSULTA></RESTITUICAO><RESTITUICAO><CPF>17834368168</CPF><ANO>2009</ANO><SITUACAODECLARACAO>Sua declaração não consta na base de dados da Receita Federal</SITUACAODECLARACAO><DATACONSULTA>05/01/2012</DATACONSULTA></RESTITUICAO><RESTITUICAO><CPF>17834368168</CPF><ANO>2008</ANO><SITUACAODECLARACAO>Sua declaração não consta na base de dados da Receita Federal</SITUACAODECLARACAO><DATACONSULTA>05/01/2012</DATACONSULTA></RESTITUICAO><RESTITUICAO><CPF>17834368168</CPF><ANO>2007</ANO><SITUACAODECLARACAO>Sua declaração consta como Pedido de Regularização(PR), na base de dados da Secretaria da Receita Federal do Brasil</SITUACAODECLARACAO><DATACONSULTA>05/01/2012</DATACONSULTA></RESTITUICAO><RESTITUICAO><CPF>17834368168</CPF><ANO>2006</ANO><SITUACAODECLARACAO>Sua declaração não consta na base de dados da Receita Federal</SITUACAODECLARACAO><DATACONSULTA>05/01/2012</DATACONSULTA></RESTITUICAO><STATUS><RESULT>TRUE</RESULT><MESSAGE></MESSAGE></STATUS></QUERY>

This input file has on each line two important informations: CPF which is the document number I will use, and the XML file (that represents the return of a query for the document on a database).

What I Must Achieve:

Each Document, in this old format has an XML containing the query returns for all the years (2006 to 2010). After reformatting it, each input line is converted to 5 output lines :

CPF=17834368168;YEARS=2010; <?xml version='1.0' encoding='ISO-8859-1'?><QUERY><RESTITUICAO><CPF>17834368168</CPF><ANO>2010</ANO><SITUACAODECLARACAO>Sua declaração não consta na base de dados da Receita Federal</SITUACAODECLARACAO><DATACONSULTA>05/01/2012</DATACONSULTA></RESTITUICAO><STATUS><RESULT>TRUE</RESULT><MESSAGE></MESSAGE></STATUS></QUERY>
CPF=17834368168;YEARS=2009; <?xml version='1.0' encoding='ISO-8859-1'?><QUERY><RESTITUICAO><CPF>17834368168</CPF><ANO>2009</ANO><SITUACAODECLARACAO>Sua declaração não consta na base de dados da Receita Federal</SITUACAODECLARACAO><DATACONSULTA>05/01/2012</DATACONSULTA></RESTITUICAO><STATUS><RESULT>TRUE</RESULT><MESSAGE></MESSAGE></STATUS></QUERY>
CPF=17834368168;YEARS=2008; <?xml version='1.0' encoding='ISO-8859-1'?><QUERY><RESTITUICAO><CPF>17834368168</CPF><ANO>2008</ANO><SITUACAODECLARACAO>Sua declaração não consta na base de dados da Receita Federal</SITUACAODECLARACAO><DATACONSULTA>05/01/2012</DATACONSULTA></RESTITUICAO><STATUS><RESULT>TRUE</RESULT><MESSAGE></MESSAGE></STATUS></QUERY>
CPF=17834368168;YEARS=2007; <?xml version='1.0' encoding='ISO-8859-1'?><QUERY><RESTITUICAO><CPF>17834368168</CPF><ANO>2007</ANO><SITUACAODECLARACAO>Sua declaração consta como Pedido de Regularização(PR), na base de dados da Secretaria da Receita Federal do Brasil</SITUACAODECLARACAO><DATACONSULTA>05/01/2012</DATACONSULTA></RESTITUICAO><STATUS><RESULT>TRUE</RESULT><MESSAGE></MESSAGE></STATUS></QUERY>
CPF=17834368168;YEARS=2006; <?xml version='1.0' encoding='ISO-8859-1'?><QUERY><RESTITUICAO><CPF>17834368168</CPF><ANO>2006</ANO><SITUACAODECLARACAO>Sua declaração não consta na base de dados da Receita Federal</SITUACAODECLARACAO><DATACONSULTA>05/01/2012</DATACONSULTA></RESTITUICAO><STATUS><RESULT>TRUE</RESULT><MESSAGE></MESSAGE></STATUS></QUERY>

One line, containing each year information about that document. So basically, the output files are 5 times longer than the input files.

Performance Issue:

Each file has 400,000 lines, and I have 133 files to process.

At the moment, here is the flow of my app :

  1. Open a file
  2. Read a line
  3. Parse it to the new format
  4. Write the line to the output file
  5. Goto 2 until there is no left line
  6. Goto1 until there is no left file

Each input file is about 700MB, and it is taking forever to read files and write the converted version of them to another one. A file with 400KB takes ~30 seconds to achieve the process.

Extra Information:

My machine runs on a Intel i5 processor, with 8GB RAM.

I am not instantiating tons of object to avoid mem. leaking, and I’m using the using clause on input file opening.

What can I do to make it run faster ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T08:04:23+00:00Added an answer on May 30, 2026 at 8:04 am

    I don’t know what your code looks like, but here’s an example which on my box (admittedly with an SSD and an i7, but…) processes a 400K file in about 50ms.

    I haven’t even thought about optimizing it – I’ve written it in the cleanest way I could. (Note that it’s all lazily evaluated; File.ReadLines and File.WriteAllLines take care of opening and closing the files.)

    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.IO;
    using System.Linq;
    
    class Test
    {
        public static void Main()
        {
            Stopwatch stopwatch = Stopwatch.StartNew();
            var lines = from line in File.ReadLines("input.txt")
                        let cpf = ParseCpf(line)
                        let xml = ParseXml(line)
                        from year in ParseYears(line)
                        select cpf + year + xml;
    
            File.WriteAllLines("output.txt", lines);
            stopwatch.Stop();
            Console.WriteLine("Completed in {0}ms", stopwatch.ElapsedMilliseconds);
        }
    
        // Returns the CPF, in the form "CPF=xxxxxx;"
        static string ParseCpf(string line)
        {
            int start = line.IndexOf("CPF=");
            int end = line.IndexOf(";", start);
            // TODO: Validation
            return line.Substring(start, end + 1 - start);
        }
    
        // Returns a sequence of year values, in the form "YEAR=2010;"
        static IEnumerable<string> ParseYears(string line)
        {
            // First year.
            int start = line.IndexOf("YEARS=") + 6;
            int end = line.IndexOf(" ", start);
            // TODO: Validation
            string years = line.Substring(start, end - start);
            foreach (string year in years.Split(';'))
            {
                yield return "YEARS=" + year + ";";
            }
        }
    
        // Returns all the XML from the leading space onwards
        static string ParseXml(string line)
        {
            int start = line.IndexOf(" <?xml");
            // TODO: Validation
            return line.Substring(start);
        }
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have the following situation, the user should be able to see an overview
Overview: I have an array of 20 byte strings that needs to be stored
Overview: I have a SWF banner ad template which loads in JSON from a
Overview: I have a Silverlight 4 app where I am seeing problematic behavior from
Situation overview. We have several projects used in multifunctional system. objectAccessLibrary.dll (multiple versions) dispatcherHandler.dll
Conceptual Overview Store templated text files on the file system. Read text file into
Overview I have an iOS app which sends local notifications at specific dates. I
Overview: I have an advancedDataGrid that I am using a GroupingCollection on and I
Overview I have an iOS project which contains 2 navigation controllers as shown in
Overview: I am trying to avoid a race condition with accessing an IndexedDB from

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.