I have some reasonable (not obfuscated) Perl source files, and I need a tokenizer,

Question

0

Asked: May 16, 20262026-05-16T08:32:33+00:00 2026-05-16T08:32:33+00:00

I have some reasonable (not obfuscated) Perl source files, and I need a tokenizer,

0

I have some reasonable (not obfuscated) Perl source files, and I need a tokenizer, which will split it to tokens, and return the token type of each of them, e.g. for the script

print "Hello, World!\n";

it would return something like this:

keyword 5 bytes
whitespace 1 byte
double-quoted-string 17 bytes
semicolon 1 byte
whitespace 1 byte

Which is the best library (preferably written in Perl) for this? It has to be reasonably correct, i.e. it should be able to parse syntactic constructs like qq{{\}}}, but it doesn’t have to know about special parsers like Lingua::Romana::Perligata. I know that parsing Perl is Turing-complete, and only Perl itself can do it right, but I don’t need absolute correctness: the tokenizer can fail or be incompatible or assume some default in some very rare corner cases, but it should work correctly most of the time. It must be better than the syntax highlighting built into an average text editor.

FYI I tried the PerlLexer in pygments, which works reasonable for most constructs, except that it cannot find the 2nd print keyword in this one:

print length(<<"END"); print "\n";
String
END

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T08:32:33+00:00

Editorial Team

2026-05-16T08:32:33+00:00Added an answer on May 16, 2026 at 8:32 am

PPI

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have some reasonable (not obfuscated) Perl source files, and I need a tokenizer,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply