I’m trying to open a file with regular HTML and special Unicode characters such as “ÖÄÅ öäå” (Swedish), format it and then output it to a file.
So far everything works out great, I can open the file, find the parts I need and output into a file.
But here is the point:
I can’t save the inputted Unicode data into the file without losing my encoding (eg. an ‘ö’ becomes ‘ö’).
Although I can, by manually entering them into the code itself, manage to both perform regex and output them to correct encoding. But not when I’m importing a file, formatting it and then outputting.
Example on working approach when using OCT (eg. this can output to the file without the encoding problem):
my $charsSWE = "öäåÅÄÖ";
# \344 = ä
# \345 = å
# \305 = Å
# \304 = Ä
# \326 = Ö
# \366 = ö
my $SwedishLetters = '\344 \345 \305 \304 \326 \366';
if($charsSWE =~ /([$SwedishLetters]+)/){
print "Output: $1\n";
}
The way below does not work because the encoding is lost (this is a quick illustration of the part of the code but its concept is the same [eg. open file, fetch and output]):
open(FH, 'swedish.htm') or die("File could not be opened");
while(<FH>)
{
my @List = /([$SwedishLetters]+)/g;
message($List[0]) if @List;
}
close(FH);
You may need to use a different encoding.