I’m trying to write a parser for the EDI data format, which is just

Question

0

Asked: May 16, 20262026-05-16T12:27:19+00:00 2026-05-16T12:27:19+00:00

I’m trying to write a parser for the EDI data format, which is just

0

I’m trying to write a parser for the EDI data format, which is just delimited text but where the delimiters are defined at the top of the file.

Essentially it’s a bunch of splits() based on values I read at the top of my code.
The problem is theres also a custom ‘escape character’ that indicates that I need to ignore the following delimiter.

For example assuming * is the delimiter and ? is the escape, I’m doing something like

use Data::Dumper;
my $delim = "*";
my $escape = "?";
my $edi = "foo*bar*baz*aster?*isk";

my @split = split("\\" . $delim, $edi);
print Dumper(\@split);

I need it to return “aster*isk” as the last element.

My original idea was to do something where I replace every instance of the escape character and the following character with some custom-mapped unprintable ascii sequence before I call my split() functions, then another regexp to switch them back to the right values.

That is doable but feels like a hack, and will get pretty ugly once I do it for all 5 different potential delimiters. Each delimiter is potentially a regexp special char as well, leading to a lot of escaping in my own regular expressions.

Is there any way to avoid this, possibly with a special regexp passed to my split() calls?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T12:27:19+00:00

This is a bit tricky if you want to handle the case where the escape character is the last character of a field correctly. Here’s one way:

# Process escapes to hide the following character:
$edi =~ s/\Q$escape\E(.)/sprintf '%s%d%s', $escape, ord $1, $escape/esg;

my @split = split( /\Q$delim\E/, $edi);

# Convert escape sequences into the escaped character:
s/\Q$escape\E(\d+)\Q$escape\E/chr $1/eg for @split;

Note that this assumes that neither the escape char nor the delimiter will be a digit, but it does support the full range of Unicode characters.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to write a parser for the EDI data format, which is just

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply