I have a similar problem to the one that is answered in this post

Question

0

Editorial Team

Asked: June 11, 20262026-06-11T20:55:16+00:00 2026-06-11T20:55:16+00:00

I have a similar problem to the one that is answered in this post

0

I have a similar problem to the one that is answered in this post.

When I test the regex provided as the answer in that post it worked as expected:

  $str = 'Days - £9.20 to £11.20 Sat - £11.80 Sun - £13.30';            

  preg_match_all("/£\s*\d+(?:\.\d+)?/", $str, $matches);                    

  print_r($matches);

  // Produces
  Array
  (
     [0] => Array
        (
          [0] => £9.20
          [1] => £10.20
          [2] => £11.80
         )
   )

The problem comes when I try to use this to process data from a CSV that i’ve converted to an array in a foreach loop:

foreach($arrJobs as $job)
{
    $str = $job['payDetails1'] . ' ' . $job['payDetails2'];                     

    // Try to find salary from string           

    preg_match_all("/£\s*\d+(?:\.\d+)?/", $str, $matches);

    print_r($matches);                  
}

// In this example the output from every item is an empty array:

 Array
 (
   [0] => Array
       (
       )

  )

The string I used to test the function in the first example was gotten by echoing out the value of $str in the second example and copying and pasting it.

I don’t understand why the same string returns different results? And why when I paste the string into a variable it works fine but when the string is retrieved from the CSV no matches are found?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T20:55:17+00:00

[Answer derived from comments and feedback above]

The problem

The problem here is that your source file and your CSV input are not saved with the same character encoding.

All built-in string functions in PHP (including the PCRE functions when the /u flag is not used) operate blindly on sequences of bytes and do not recognize characters as such. This means that for scripts that contain characters outside the ASCII range the runtime behavior will change depending on which encoding the script is saved in, since these characters will be converted to bytes differently for each and every encoding used in practice. Your script contains one such character: the pound sign.

A quick solution

Assuming that the possible encodings in play here are ISO 8859-1 (Western European) and UTF-8, all the remaining characters matched by your regular expression have the same representation in both encodings so they will present no problem. So let’s see what we can do about the pound sign.

Typically you would solve this problem by replacing the literal £ with an alternation group that covers all of the character’s encodings.

        POUND SIGN ( £ )
when encoded in    is represented as
------------------------------------
ISO 8859-1         0xA3
UTF-8              0xC2 0xA3

So that would be (\xa3|\xc2\xa3) — the first part covers ISO 8859-1 and the second UTF-8. However, seeing as both parts end in \xa3 the same result can be also had with \xc2?\xa3 (making the \xc2 prefix optional).

Therefore you can solve your problem in a somewhat quick and dirty manner by changing the code to

preg_match_all('/\xc2?\xa3\s*\d+(?:\.\d+)?/', $str, $matches);

A better solution

The best solution however would be to always work in UTF-8. To do this, you would need to

Save your script as UTF-8
Make sure that the input CSV is either in UTF-8 to begin with or that you convert it to UTF-8 before processing (you can use iconv to do this)

This way you can go back to saving a literal pound sign in your script, and still be safe in the knowledge that it will work correctly no matter what the input encoding for your CSV data.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a similar problem to the one that is answered in this post

Leave an answerCancel reply

1 Answer

The problem

A quick solution

A better solution

Leave an answer
Cancel reply