I need to split a list between its first item and the rest of its items using regex substitution only.
The lists of items are input as strings using ‘##’ as a separator, e.g.:
''
'one'
'one##two'
'one##two##three'
'one##two words##three'
My Perl attempt doesn’t really work:
my $sampleText = 'one##two words##three';
my $first = $sampleText;
my $rest = $sampleText;
$first =~ s/(.+?)(##.*)?/$1/g;
$rest =~ s/(.?+)(##)?(.*)/$3/g;
print "sampleText = '$sampleText', first = '$first', rest = '$rest'\n";
sampleText = 'one##two words##three', first = 'one', rest = 'ne##two words##three'
Please note the constraints:
- the separator is a multi-character string
- only regex substitutions are allowed (1)
- I could “chain” regex substitutions if necessary
- The expected end result is two strings: the first element, and the initial string with the first element cut off (2)
- the list may have from 0 to n items, each being any string not containing the separator.
(1) I work with this rather large Perl system where at some point lists of items are processed using provided operations. One of them is a regex substitution. None of the others one are applicable. Solving the problem using full Perl code is easy, but that would mean modifying the system, which is not an option as this time.
(2) the context is the Unimarc bibliographic format, where authors of a publication are to be split into the standard Unimarc fields 700$a for the first author, and 701$a for any remaining authors.
I assume point (1) means you cannot use the
splitbuiltin? It would be easy using splits optional third parameter which lets you specify the maximum number of items.But if it has to be regex replace then your is almost right, but using
.+?wont work when there’s no sperators (because it will just take the first character You can fix this by anchoring the end. Instead something like: