I have a huge string (22000+ characters) of encoded text. The code is consisted of digits [0-9] and lower case letters [a-z]. I need a regular expression to insert a space after every 4 characters, and one to insert a line break [\n] after every fourty characters. Any ideas?
Share
Well, a regexp in itself doesn’t insert a space, so I’ll assume you have some command in whatever language you’re using that inserts based on finding a regexp.
So, finding 4 characters and finding 40 characters: that’s not pretty in general regular expressions (unless your particular implementation has nice ways to express numbers). For finding 4 characters, use
Because typical regexp finders use maximal munch, then from the end of one regexp, search forward and maximally munch again, that’ll chunk your string into 4 character pieces. The ugly part is that in standard regular expressions, you’ll have to use
to find chuncks of 40 characters, although I’ll note that if you run your 4 character one first, you’ll have to run
or
to account for the spaces you’ve already put in.
The period finds any characters, but given that you’re only using [0-9|a-z], you could use that regexp in place of each period if you need to ensure nothing else slipped in, I was just avoiding making it even more gross.
As you may be noting, regexp have some limitations. Take a look at the Chomsky hierarchy to really get into their theoretical limitations.