I need a rule that regex or similar that can be used to parse any given input string containing space delimited words to create a usually longer output string where certain parts get expanded according to certain conditions. I could create code to do this from scratch but I was wondering if I might not need to as this would be not a trivial as it may seem.
In the following example I will use ‘a b c etc…’ to represent words, this could just as easily be ‘cake 14 h etc…’ for example however ‘a b c etc…’ are easier to use to describe how the rule should work. I also use the special characters {, }, [, | and ]. In doing so I am not refering to regex meanings these characters may have.
I’m also going to include line breaks that shouldn’t exist in the examples to make things more readable.
The rule would specify that everything inside {} enclosures that occur in the input string wouldn’t appear as is in the output string. The contents of a {} would instead occur in the same place but repeated a number of times defined by it’s [] enclosures.
1.
Note that ‘b’ and ‘c’ are separated with ‘|’.
{a [b | c]}
should become:
a b
a c
2.
Note that ‘b’ and ‘c’ are together and separate from ‘d’. The {} enclosure contains two []’s, The first containing two elements and the 2nd with 3 elements making 6 in total.
{[a b | c][d | e | f]}
should become:
a b d
a b e
a b f
c d
c e
c f
3.
And now for a more involved example.
{a [b c | d] e f [g | h | i]} j
should become:
a b c e f g
a b c e f h
a b c e f i
a d e f g
a d e f h
a d e f i
j
without the line breaks such that it should read:
a b c e f g a b c e f h a b c e f i a d e f g a d e f h a d e f i j
Here are two more concrete examples from DR Seuss with line breaks added to make things easier to read, the 2nd example is edited significantly from the original text:
input:
{I do not like [them in a box | them with a fox | them in a house
| them with a mouse | them here or there | them anywhere | green
eggs and ham | them, Sam-I-am].}
output:
I do not like them in a box.
I do not like them with a fox.
I do not like them in a house.
I do mot like them with a mouse.
I do not like them here or there.
I do not like them anywhere.
I do not like green eggs and ham.
I do not like them, Sam-I-am.
input:
{[Would | could] you} ? {Would you [like | eat] them
[in a house | with a mouse]?}
output:
Would you, could you?
Would you like them in a house?
Would you like them with a mouse?
Would you eat them in a house?
Would you eat them with a mouse?
Ideally the {} enclosures should be able to stack. None of these examples have shown stacking {} enclosures.
I can already reference individual words from their number (1st, 2nd etc…) or other label, this is easier than for example looking up individual letters some offset into the whole input due to how I am storing the text.
Regex is probbably not helpfull, others may be helpfull but most of the work would still need to be done yourself.