Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9248491
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T09:57:23+00:00 2026-06-18T09:57:23+00:00

i have a task where i need to parse an extremely big file and

  • 0

i have a task where i need to parse an extremely big file and write the results into a mysql database. “extremely big” means we are talking about 1.4GB of sort-of-CSV data, totalling in approx 10 million lines of text.

Thing is not “HOW” to do it, but how to do it FAST. my first approach was to just do it in php without any speed optimization and then let it run for a few days until it’s done. unfortunately, it’s been running for 48 hours straight right now and has processed only 2% of the total file. therefore, that’s not an option.

the file format is as follows:

A:1,2

where the amount of comma separated numbers following the “:” can be 0-1000. the example dataset has to go into a table as follows:

| A | 1 |
| A | 2 |

so right now, i did it like this:

$fh = fopen("file.txt", "r");

$line = ""; // buffer for the data
$i = 0; // line counter
$start = time(); // benchmark

while($line = fgets($fh))
{
    $i++;       
    echo "line " . $i . ": ";

    //echo $i . ": " . $line . "<br>\n";

    $line = explode(":", $line);

    if(count($line) != 2 || !is_numeric(trim($line[0])))
    {
        echo "error: source id [" .  trim($line[0]) . "]<br>\n";
        continue;
    }

    $targets = explode(",", $line[1]);

    echo "node " .  $line[0] . " has " . count($targets) . " links<br>\n";

    // insert links in link table
    foreach($targets as $target)
    {
            if(!is_numeric(trim($target)))
            {
                echo "line " . $i . " has malformed target [" . trim($target) . "]<br>\n";
                continue;
            }

            $sql = "INSERT INTO link (source_id, target_id) VALUES ('" .  trim($line[0]) . "', '" .  trim($target) . "')";
            mysql_query($sql) or die("insert failed for SQL: ". mysql_error());
        }
}

echo "<br>\n--<br>\n<br>\nseconds wasted: " . (time() - $start);

this is obviously not optimized for speed in ANY way. any hints for a fresh start? should i switch to another language?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T09:57:24+00:00Added an answer on June 18, 2026 at 9:57 am

    The first optimization would be to insert with a transaction – each 100 or 1000 lines commit and begin a new transaction. Obviously you’d have to use a storage engine that supports transactions.

    Then observe the CPU usage with the top command – if you have multiple cores, the mysql process does not do much and the PHP process does much of the work, rewrite the script to accept a parameter that skips n lines from the beginning and only import 10000 lines or so. Then start multiple instances of the script, each with a different starting point.

    Third solution would be to convert the file into a CSV with PHP (no INSERT at all, just writing to a file) and the using LOAD DATA INFILE as m4t1t0 suggested.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a task where i need to write a multidimensional array to HDFS.
I have some problem with CompletionService. My task: to parse parallely for about 300
I have a task to parse an XML file with JDom in Eclipse. When
I have a rather big number of source files that I need parse and
I need to parse an xml file which takes about 3 seconds, and immediately
I have a task I need to perform, do_stuff(opts) , that will take ~1s
I have a rake task I need to run as a daily job on
I have a task where I need to translate a DataTable to a two-dimensional
I have a task to do, I need to take data that has been
I have a jQuery function already to perform the task I need but is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.