Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6642971
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T23:59:36+00:00 2026-05-25T23:59:36+00:00

I have n csv files which I need to compare against each other and

  • 0

I have n csv files which I need to compare against each other and modify them afterwards.
The Problem is that each csv file has around 800.000 lines.

To read the csv file I use fgetcsv and it works good. Get some memory pikes but in the end it is fast enough. But if I try to compare the array against each other it takes ages.

One other Problem is that I have to use a foreach to get the csv data with fgetcsv because of the n amount of files. I end up with one ultra big array and can’t compare it with array_diff. So i need to compare it with nested foreach loops and that take ages.

a code snippet for better understanding:

foreach( $files as $value ) {
    $data[] = $csv->read( $value['path'] );
}

my csv class use fgetcsv to add the output to the array:

fgetcsv( $this->_fh, $this->_lengthToRead, $this->_delimiter, $this->_enclosure )

Every data of all the csv files are stored in the $data array. This is probably the first big mistake to use only one array, but I have no clue how to stay flexible with the files without to use an foreach. I tried to use flexible variable names but I stucked there as well 🙂

Now I have this big array. Normally if I try to compare the values against each other and to find out if the data from file one exists in file two and so on, I use array_diff or array_intersect. But in this case I have only this one big array. And as I said, to run an foreach over it takes ages.

Also after only 3 files I have an array with 3 * 800.000 entries. I guess latest after 10 files my memory will explode.

So is there any better way to use PHP to compare n amount of very large csv files?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T23:59:36+00:00Added an answer on May 25, 2026 at 11:59 pm

    Use SQL

    • Create a table with the same columns as your CSV files.
    • Insert the data from the first CSV file.
    • Add indexes to speed up queries.
    • Compare with other CSV files by reading a line and issuing a SELECT.

    You did not describe how you compare n files, and there are several ways to do so. If you just want to find the line that are in A1 but not in A2,…,An, then you’ll just have to add a boolean column diff in your table. If you want to know in which files a line is repeated, you’ll need a text column, or a new table if a line can be in several files.

    Edit: a few words on performance if you’re using MySQL (I do not now much about other RDBMS).

    Inserting lines one by one would be too slow. You probably can’t use LOAD DATA unless you can put the CSV files directly onto the DB server’s filesystem. So I guess the best solution is to read a few hundreds of lines in the CSV then send a multiple insert query INSERT INTO mytable VALUES (..1..), (..2..).

    You can’t issue a SELECT for each line you read in your other files, so you’d better put them in another table. Then issue a multiple-table update to mark the rows that are identical in the tables t1 and t2: UPDATE t1 JOIN t2 ON (t1.a = t2.a AND t1.b = t2.b) SET t1.diff=1

    Maybe you could try using sqlite. No concurrency problems here, and it could be faster than the client/server model of MySQL. And you don’t need to setup much to use sqlite.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have two CSV files (three columns) which I need to compare and extract
I have two CSV files which use @ to divide each column. The first
I have a Testing project that contains a csv file which a webtest uses
I have a PHP app that creates a CSV file which is forced to
I have multiple CSV files which I need to parse in a loop to
I have a CSV file with many rows in which I need to update/replace
I have unpredictable frequency of Incoming csv data file which I need to store
I have two tab delimited csv files (with headers) that I need to merge
I have a 12 XML files from which I am extracting ONE CSV file,
I have about 20 .csv files which are around 100-200mb each. They each have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.