I have a small program to order and sort email messages, outputting to a textfile using $msg->decoded->string. The perl program outputs to stdout, and I redirect it to a txt file. However, gedit is unable to open this text file because of a character set problem, and I would like to know how to restore or set a character set with perl.
The program is now thus:
#!/usr/bin/perl use warnings; use strict; use Mail::Box::Manager; open (MYFILE, '>>data.txt'); my $file = shift || $ENV{MAIL}; my $mgr = Mail::Box::Manager->new( access => 'r', ); my $folder = $mgr->open( folder => $file ) or die '$file: Unable to open: $!\n'; for my $msg ( sort { $a->timestamp <=> $b->timestamp } $folder->messages) { my $to = join( ', ', map { $_->format } $msg->to ); my $from = join( ', ', map { $_->format } $msg->from ); my $date = localtime( $msg->timestamp ); my $subject = $msg->subject; my $body = $msg->decoded->string; # Strip all quoted text $body =~ s/^>.*$//msg; print MYFILE <<''; From: $from To: $to Date: $date $body }
However I get the same problem that I am unable to open the file with gedit, even though it works with vi or such. If there are non unicode characters in the file, would this break it?
Different messages probably are in different encodings. Probably gedit detects it as UTF-8, but later finds out that parts of the file aren’t UTF-8. Mixed files like this are major PITA.
The best (perhaps only) solution is to check for the content type (
$message->contentType) and convert everything to UTF-8.