Lets say I have a file called foo.txt encoded in utf8:
aoeu
qjkx
ñpyf
And I want to get an array that contains all the lines in that file (one line per index) that have the letters aoeuñpyf, and only the lines with these letters.
I wrote the following code (also encoded as utf8):
$allowed_letters=array("a","o","e","u","ñ","p","y","f");
$lines=array();
$f=fopen("foo.txt","r");
while(!feof($f)){
$line=fgets($f);
foreach(preg_split("//",$line,-1,PREG_SPLIT_NO_EMPTY) as $letter){
if(!in_array($letter,$allowed_letters)){
$line="";
}
}
if($line!=""){
$lines[]=$line;
}
}
fclose($f);
However, after that, the $lines array just has the aoeu line in it.
This seems to be because somehow, the “ñ” in $allowed_letters is not the same as the “ñ” in foo.txt.
Also if I print a “ñ” of the file, a question mark appears, but if I print it like this print "ñ";, it works.
How can I make it work?
If you are running Windows, the OS does not save files in UTF-8, but in cp1251 (or something…) by default you need to save the file in that format explicitly or run each line in
utf8_encode()before performing your check. I.e.:If you are sure that the file is UTF-8 encoded, is your PHP file also UTF-8 encoded?
If everything is UTF-8, then this is what you need :
(append
ufor unicode chars)However, let me suggest a yet faster way to perform your check :
(add space chars to allow space characters as well, and remove the
rtrim($line))