I want to implement my own tweet compressor . Basically this does the following.

Question

0

Asked: May 15, 20262026-05-15T23:33:55+00:00 2026-05-15T23:33:55+00:00

I want to implement my own tweet compressor . Basically this does the following.

0

I want to implement my own tweet compressor. Basically this does the following. However I’m stuck with some of the unicode issues.

Here’s my script:

#!/usr/bin/env perl
use warnings;
use strict;

print tweet_compress('cc ms ns ps in ls fi fl ffl ffi iv ix vi oy ii xi nj/, "\. " ,", "'),"\n";

sub tweet_compress {
    my $tweet = shift;
    $tweet =~ s/\. ?$//;
    my @orig = ( qw/cc ms ns ps in ls fi fl ffl ffi iv ix vi oy ii xi nj/, ". " ,", ");
    my @new = qw/㏄ ㎳ ㎱ ㎰ ㏌ ʪ ﬁ fl ﬄ ﬃ ⅳ ⅸ ⅵ ѹ ⅱ ⅺ ǌ ． ，/;
    $tweet =~ s/$orig[$_]/$new[$_]/g for 0 .. $#orig;
    return $tweet;
}

But this prints junk out at the terminal:

?．?．?．?．?．?．?．f．?．f?．?．?．?．?．?．?．ǌ/．"\．．,"．"

What am I doing wrong?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T23:33:55+00:00

Two issues.

Firstly you have unicode characters in your source code. Make sure you save your file as utf8 and use the use utf8 pragma.

Also if you intend to run this program from a console make sure it can handle unicode. Windows command prompt cannot and will always show ? regardless of whether your data is correct or not. I ran this on Mac OS with Terminal set to handle utf8.

Secondly, if you have “.” in your orig list, it’ll get interpreted as “any single character” and give you wrong results – so you need to escape it before using it in your regular expression. I’ve modified the program a little to make it work.

#!/usr/bin/env perl
use warnings;
use strict;
use utf8; #use character semantics

#make sure the data is re-encoded to utf8 when output to terminal
binmode STDOUT, ':utf8';

print tweet_compress('cc ms ns ps in ls fi fl ffl ffi iv ix vi oy ii xi nj/, "\. " ,", "'),"\n";

sub tweet_compress {
    my $tweet = shift;
    $tweet =~ s/\. ?$//;
    my @orig = ( qw/cc ms ns ps in ls fi fl ffl ffi iv ix vi oy ii xi nj/, '\. ' ,", ");
    my @new = qw/㏄ ㎳ ㎱ ㎰ ㏌ ʪ ﬁ fl ﬄ ﬃ ⅳ ⅸ ⅵ ѹ ⅱ ⅺ ǌ ． ，/;
    $tweet =~ s/$orig[$_]/$new[$_]/g for 0 .. $#orig;
    return $tweet;
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to implement my own tweet compressor . Basically this does the following.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply