Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9181315
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T18:16:59+00:00 2026-06-17T18:16:59+00:00

I am trying to compare 2 csv files in php by importing them into

  • 0

I am trying to compare 2 csv files in php by importing them into multi-dimension array and using the array_diff function to find out differences.

The methodology that I am using is

1) Fetch every record of expected csv and dump into arr1

2) Fetch every record of actual csv and dump into arr2

3) Sort array1 using array_multisort

4) Sort array2 using array_multisort

5) Compare using array_diff function each record (eg arr1[0][1] vs arr2[0][1])

My objective is compare the files using php script in least possible time. I found the above approach to be the shortest (tried initially dumping th csv contents into MySQL and using db queries to compare, but for some unknown reason, the queries are working so damn slow that its crashing my Apache server after timeout)

I have files of sizes up to 300mb in csv, though usually it would be 70k records with 20 columns and 10mb size

I am pasting the code of what I have done (w.r.t the steps described above)

     $header='';

    $file_handle = fopen($fileExp, "r");
    $k=0;

    while ($data=fgetcsv($file_handle,0,$_POST['dl1'])) {

        if(count($data)==1 && $data[0]=='')
            continue;
        else
        {
            $urarr1[$k]='';
            for($i=0;$i<count($data);$i++)
            {



                if(in_array($i,$exclude_cols,true))
                    $rarr1[$k][$i]='NTBT';
                else
                    $rarr1[$k][$i]=trim($data[$i]);

            }   


            $k++;
        }




    }

    fclose($file_handle);



    echo '<br>Exp Record count: '.count($rarr1);
    $header.='<br>Exp Record count: '.count($rarr1);

    $hrow=$rarr1[0];   //fetch header row and then unset it
    unset($rarr1[0]);

    array_multisort($rarr1);   //need to sort on all 20 columns asc

    $rarr1=array_values($rarr1); //re-number the array



       //writing the sorted o/p to file...debugging purposes
    $fp = fopen($_POST['op'].'/file1.csv', 'w');

    foreach ($rarr1 as $fields) {
        fputcsv($fp, $fields);
    }

    fclose($fp);


     //Repeat for actual .csv

    $file_handle = fopen($fileAct, "r");
    $k=0;

    while ($data=fgetcsv($file_handle,0,$_POST['dl2'])) {

        if(count($data)==1 && $data[0]=='')
            continue;
        else
        {
            for($i=0;$i<count($data);$i++)
            {


                if(in_array($i,$exclude_cols,true))
                    $rarr2[$k][$i]='NTBT';
                else
                    $rarr2[$k][$i]=trim($data[$i]);
            }   

            $k++;

        }

    }

    fclose($file_handle);

    unset($file_handle);


    echo '<br>Act Record count: '.count($rarr2);
    $header.='<br>Act Record count: '.count($rarr2);

    unset($rarr2[0]);

    array_multisort($rarr2);

    $rarr2=array_values($rarr2);

    $fp = fopen($_POST['op'].'/file2.csv', 'w');

    foreach ($rarr2 as $fields) {
        fputcsv($fp, $fields);
    }

    fclose($fp);


       ///Comparison logic

    $header.= '<br>';

    $header.= '<table>';
    $header.= '<th>RECORD_ID</th>';
    for($i=0;$i<count($hrow);$i++)
    {
        $header.= '<th>'.$hrow[$i].'_EXP</th>';
        $header.= '<th>'.$hrow[$i].'_ACT</th>';
    }

    $r=array();
    for($i=0;$i<count($rarr1);$i++)
    {

        if(array_diff($rarr1[$i],$rarr2[$i]) || array_diff($rarr2[$i],$rarr1[$i]))
        {

            $r[$i]=array_unique(array_merge(array_keys(array_diff($rarr1[$i],$rarr2[$i])),array_keys(array_diff($rarr2[$i],$rarr1[$i]))));


            foreach($r[$i] as $key=>$v)
            {
                if(in_array($v,$calc_cols))
                {
                    if(abs($rarr1[$i][$v]-$rarr2[$i][$v])<0.2)
                    {
                        unset($r[$i][$key]);
                    }   
                }
                elseif(is_numeric($rarr1[$i][$v]) && is_numeric($rarr2[$i][$v]) && !in_array($v,$calc_cols) && ($rarr1[$i][$v]-$rarr2[$i][$v])==0)
                {
                    unset($r[$i][$key]);
                }   
            }



            if(empty($r[$i]))
                unset($r[$i]);

            if(isset($r[$i]))
            {
                $header.= '<tr>';

                $header.= '<td>'.$i.'</td>';

                for($j=0;$j<count($rarr1[$i]);$j++)
                {

                    if(in_array($j,$r[$i]))
                    {
                        $header.= '<td style="color:orange">'.$rarr1[$i][$j].'</td>';
                        $header.= '<td style="color:orange">'.$rarr2[$i][$j].'</td>';
                    }
                    else
                    {
                        $header.= '<td >'.$rarr1[$i][$j].'</td>';
                        $header.= '<td >'.$rarr2[$i][$j].'</td>';
                    }
                }
                $header.= '</tr>';
            }
        }   

    }   
    $header.= '</table>';



//print_r($r);
    echo '<br>';
    // if(!isset($r))
        // $r[0]=0;

    echo 'Differences :'.count($r)  ;

    $header.= '<br>';
    $header.= 'Differences :'.count($r) ;




    $time_end = microtime(true);
    $execution_time = ($time_end - $time_start)/60; //dividing with 60 will give the execution time in minutes other wise seconds
    echo '<br><b>Total Execution Time:</b> '.$execution_time.' Mins'; //execution time of the script

Though initially i found this working on most files, but later I found that for some files for unknown reason, the array_multisort is sorting the arr1 and arr2 differently even though the contents seem the same…I am not sure this is happening because of the data type mismatch but i tried type casting too and still it sorts but identical arrays in different fashion

Can someone please suggest what might wrong in above code? In addition, considering my requirements as mentioned above, is there a more convenient way to achieve this through php?? maybe a php plugin that compares .csv files or something?

EDIT: Sample data as requested. just a snapshot, actual would have many more columnas and rows. As stated above the .csv file sizes go well beyond 10mb! File 1 and File 2

        236|INPQR|31-AUG-12|200     |INR|       664|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |38972944.8                              |0                                       |0                                       |38972944.8
        236|INPQR|31-AUG-12|200     |INR|       6653|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |0                                       |0                                       |0                                       |0
        236|INPQR|31-AUG-12|200     |USD|       6655|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |0                                       |0                                       |0                                       |0
        236|INPQR|31-AUG-12|200     |USD|       664|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |63919609.97                             |0                                       |0                                       |63919609.97
        225|INPZQ|31-AUG-12|200     |USD|       6653|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |0                                       |0                                       |0                                       |0
        225|INPZQ|31-AUG-12|200     |USD|       6655|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |0                                       |0                                       |0                                       |0
        225|INPZQ|31-AUG-12|200     |USD|       6652|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |38972944.8                              |0                                       |0                                       |38972944.8
        225|INPZQ|31-AUG-12|200     |INR|       6652|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |63919609.97                             |0                                       |0                                       |63919609.97
        225|INPZQ|31-AUG-12|200     |INR|       6654|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |0                                       |0                                       |0                                       |0
        225|INPZQ|31-AUG-12|200     |INR|       6654|AAAAAA,PPPPP  



        236|INPQR|31-AUG-12|200     |USD|       664|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |63919609.97                             |0                                       |0                                       |63919609.97
        225|INPZQ|31-AUG-12|200     |USD|       6653|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |0                                       |0                                       |0                                       |0
        225|INPZQ|31-AUG-12|200     |USD|       6655|AAAAAA,PPPPP 
        236|INPQR|31-AUG-12|200     |INR|       664|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |38972944.8                              |0                                       |0                                       |38972944.8
        236|INPQT|31-AUG-12|200     |INR|       6653|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |0                                       |0                                       |0                                       |0
        236|INPQR|31-AUG-12|200     |USD|       6655|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |0                                       |0                                       |0                                       |0
        225|INPZQ|31-AUG-12|200     |USD|       6652|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |38972944.8                              |0                                       |0                                       |38972944.8
        225|INPZQ|31-AUG-12|200     |INR|       6652|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |63919609.97                             |0                                       |0                                       |63919609.97
        225|INPZQ|31-AUG-12|200     |USD|       6654|AAAAAA,PPPPP                                                                                                                                                                                           |0                                       |0                                       |0                                       |0                                       |0
        225|INPZQ|31-AUG-12|200     |INR|       6654|AAAAAA,PPPPP 

UPDATE: the 2 csv files could contain different date formats as well each one of them might represent numbers in different format like 1.csv could have 12-jan-2013 and 0.01 as 1st row….2.csv would have 01/12/2013 and .01
Hence i dont think hash would work

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T18:17:01+00:00Added an answer on June 17, 2026 at 6:17 pm

    There are many different ways to compare two CSV files. I used an approach to check for different rows in both files. I took into account that you want to remove certain columns from the rows.

    I did not use sorting, because I check if a row is in the other file and not if its at the same position. The reason is simple: If one row doesnt match and is sorted at the beginning of the file, all rows after this row will be different.

    Example:

    file1:  file2:
    1|a     1|a
    2|b     2|b
    3|c     3|c
    4|d     4|d
    5|e     1|e
    
    After sorting
    
    file1:  file2:
    1|a     1|a
    2|b     1|e
    3|c     2|b
    4|d     3|c
    5|e     4|d
    
    Now the rows 2, 3, 4, and 5 are all marked as different, because they do not match if you check per line. But in fact only 1 row is different.
    

    In the code below you will see comments as to why I did something. I also tested the code on several large CSV file (~45mb and 100.000 rows) and got the number of different rows in less then 10 seconds per check.

    <?php
    set_time_limit(0);
    
    //create a function to create the CSV arrays.
    //If you create the code twice like you did, you are bound to make a mistake or change something in one place and not the other.
    //Obviously that could lead to sorting two equal files differently.
    function CsvToArray($file) 
    {
      $exclude_cols = array(2); //you didnt provide it, so for testig i remove the date col because its always the same
    
      //load file contents into variable and trim it
      $data = trim(implode('', file($file)));
    
      //strip \r new line to make sure only \n is used
      $data = str_replace("\r", "", $data);
      //strip all spaces from |
      $data = preg_replace('/\s\s+\|/', '|', $data);
      $data = preg_replace('/\|\s\s+/', '|', $data);
      //strip all spaces from each line
      $data = preg_replace('/\s\s+\n/', "\n", $data);
      $data = preg_replace('/\n\s\s+/', "\n", $data);
    
      //each line to seperate row
      $data = explode("\n", $data);
    
      //each col to seperate record
      //This is only needed for comparisment if you want to remove certain cols
      //if thats not needed, you can skip this part    
      foreach($data as $k=>$v)
        $data[$k] = explode('|', $v);
    
      //get the header. Its always the first row
      //array_shift will return the first element and remove it from the dataset
      $header = array_shift($data);
    
      //turn the array around, by making the row the key and count howmany times it shows
      $ar = array();
      foreach ($data as $row) {
        //remove unwanted cols
        //if you dont want to remove certain cols, skip this and the implode part and use $ar[$row]++
        foreach($exclude_cols as $c)
          $row[$c] = '';
        //implode the remaining
        $key = implode('', $row);
    
        //you can use str_to_lower($key) for case insensive matching
        $ar[$key]++;    
      }
    
      return $ar;
    }
    
    function CompareTwoCsv($file1, $file2)
    {
      $start = microtime(true);
    
      $ar1 = CsvToArray($file1);
      $ar2 = CsvToArray($file2);
    
      //check for differences.
      $diff = 0;
      foreach($ar1 as $k=>$v) {
        //the second array doesnt contain the key (is row) so there is a difference
        if (!array_key_exists($k, $ar2)) {
          $diff+=$v; //all rows that are in the first array are different
          continue;
        }
        $c2 = $ar2[$k];
    
        if ($v == $c2) //row is in both file an equal number of times
          continue;
    
        $diff += max($v, $c2) - min($v, $c2); //add the number of different rows
      }
    
      $ar1_count = count($ar1);
      $ar2_count = count($ar2);
    
      //if ar2 has more records. Every row that is more, is different.
      if ($ar2_count>$ar1_count)
        $diff += $ar2_count - $ar1_count;
    
      $end = microtime(true);
      $difftime = $end - $start;
    
      //debug output
      echo "We found ".$diff." differences in the files. it took ".$difftime." seconds<hr>";
    }
    
    //test and test2 are two files with ~100.000 rows based on the data you supplied.
    //They have many equal rows in the files, so the array returned from CsvToArray is small
    CompareTwoCsv("test.txt", "test.txt");
    //We found 0 differences in the files. it took 5.6848769187927 seconds
    
    CompareTwoCsv("test.txt", "test2.txt");
    //We found 17855 differences in the files. it took 6.6002569198608 seconds
    
    CompareTwoCsv("test2.txt", "test.txt");
    //We found 17855 differences in the files. it took 7.5223989486694 seconds
    
    
    //randomly generated files with 100.000 rows. Very little duplicate data;
    
    CompareTwoCsv("largescv1.txt", "largescv2.txt");
    //We found 98250 differences in the files. it took 5.4302139282227 seconds
    
    ?>
    

    results:

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to read CSV files using Java. Some of the files may have
I'm trying to import a CSV file into an array that I can use
I'm trying to compare two Xml files using C# code. I want to ignore
im trying to compare words of an array using strcmp.Im trying to get each
So I've got two CSV files that I'm trying to compare and get the
I'm trying to find a way to efficiently compare a CSV file content with
I'm trying to a somewhat sophisticated diff between individual rows in two CSV files.
I am trying to compare values of a particular column between 2 csv. I
When trying to compare to php timestamps I seem to be getting an off
I´m trying to compare values from an array with values in a combobox in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.