My task is to extract some data from a given document using Perl-style (or

Question

0

Asked: May 27, 20262026-05-27T15:47:05+00:00 2026-05-27T15:47:05+00:00

My task is to extract some data from a given document using Perl-style (or

0

My task is to extract some data from a given document using Perl-style (or at least extended) regular expression. I have:

a source document (as a file, as a variable – it doesn’t really matter):
- for example: Some text: 1234.55 value more text - 8863 value
a Perl-style / extended regular expression as a string
- for example: ^.*: ([0-9.]+) value .* - (\d+) value$

What is the best approach to extract the data in a UNIX shell script?

Let me define what I’d like to see in best approach, in the order of importance:

Portability – ideally, it should work on most current OSes and environments – i.e. at least GNU/Linux, FreeBSD/OpenBSD, Mac OS X; Cygwin is probably the same as Linux, but not in all cases
Minimal system requirements – i.e. asking to run some exotic interpreters / programs is generally a bad thing to do
Fair use of resources – i.e. it shouldn’t take ages to process some simple regexp
Clean, small, easy to understand code

I understand that it’s impossible to reach all these goals at once, so I’ve considered my alternatives:

Using sed – probably it would be the best way to go, but, alas, POSIX sed supports only basic regexps, not extended and definitely not perl-style. Various implementations add extensions, but they’re generally incompatible: GNU sed uses -r or --regexp-extended option to switch in extended mode, and BSD sed (also on Mac OS X) uses -E.
Convert extended regular expressions to basic and use original sed – seems somewhat awkward to me and I can’t find any decent algorithm proven to work properly for this task.
Using awk – generally the same as sed, but even worse: there are myriads of implementations of awk with slight incompatibilities in the wild and support for extended regular expressions is even more obscure.
Using perl – probably the easiest and sanest alternative, but, alas, Perl is not available everywhere as POSIX standard utilities are – i.e. as far as I remember, Perl is not in a core system in *BSD (and Mac OS X), it requires separate installation in Cygwin world, even some Linux distributions give a chance to omit it.
Using php, python, ruby – the same situation as with perl, but they’re generally even more uncommon, as I see in the current world.
Using grep – same as with sed; BSD uses GNU grep, but it doesn’t support -P AKA --perl-regexp, only -E AKA --extended-regexp on BSD systems. What’s even worse – it seems to be impossible to print out groups, not whole pattern matched – i.e. using grep -o (Show only the part of a matching line that matches) gives only the whole pattern, not distinct values of groups.

So, I’m kind of lost what would be the most portable and easiest to support way. Right now I’m choosing between:

Make a wrapper over sed to check whether we’re using BSD or GNU sed and run relevant commands
Insist on having perl installed to be able to run my script

Is there something missing from this overview? What would be the best alternatives? May be there’s already a wrapper written for this task somewhere (i.e. autotools / some other mysterious projects that use shell script)?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T15:47:06+00:00

absolutely portable is hard. how about do in this way, i don’t know if it is a good idea…

in fact the extracting part is easy, no matter which tool we are using. interesting is to decide, if this tool is available/suitable on current system.

you could create a list (array) of all the tools, then at the beginning of your script, you could check those tools’ availabilities, detailed versions, I think checking those a simple grep is enough. e.g.
using $? for checking availability

java -version
//check $? 

python -V
//check $?

using simple grep to check version details: like

awk -V|grep GNU
sed --version|grep GNU
....

once you found a tool which can do your job, using this tool. calling the certain script.

however, you have to prepare N solutions for the same question using N tools.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

My task is to extract some data from a given document using Perl-style (or

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply