I have a big continuous text with characters like {, },//,; and white spaces

Question

0

Asked: May 23, 20262026-05-23T04:44:57+00:00 2026-05-23T04:44:57+00:00

I have a big continuous text with characters like {, },//,; and white spaces

0

I have a big continuous text with characters like {, },//,; and white spaces in between. I want to read this text and write into new line wherever it finds these characters.

Input text is like :

apple{{mango } guava ; banana; // pear      berry;}

Expected formatted output data should be as shown in image

apple
{
{
mango
}
guava ;
banana;
// pear
berry;
}

I want to do this in perl.Thanks in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T04:44:58+00:00

Of course you will have to adapt this for your needs (most notably loop while reading lines), but here is a way to do it that doesn’t (really) rely on regexes. As others have said, this is a starting point, you may adapt to what you need.

#!/usr/bin/perl
use strict;
use warnings;

my $string = 'apple{{mango } guava ; banana; // pear      berry;}';
my $new_string = join("\n", grep {/\S/} split(/(\W)/, $string));

print $new_string . "\n";

This splits the line into an array, splitting on non-word characters but keeps the element. Then greps for non-whitespace characters (removing array elements which contain whitespace). Then joins the array elements with newline characters into one string. From what your specification says you need // together, I leave that as an exercise to the reader.

Edit:
After looking at your request again, it looks like to have a specific but complicated structure that you are trying to parse. To do it correctly you may have to use something more powerful like the Regexp::Grammars module. It will take some learning, but you can define a very complicated set of parsing instructions to do exactly whatever you need.

Edit 2:
Since I have been looking for a reason to learn more about Regexp::Grammars, I took this opportunity. This is a basic example that I came up with. It prints the parsed data structure to a file named “log.txt”. I know it doesn’t look like the structure you asked for, but it contains all of that information and may be reconstituted however you like. I did so with a recursive function that is basically the opposite of the parser.

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper;
use Regexp::Grammars;

my $grammar = qr{
  <nocontext:>
  <Line>
  <rule: Line>      <[Element]>*
  <rule: Element>   <Words> | <Block> | <Command> | <Comment>
  <rule: Command>   <[Words]> ;
  <rule: Block>     \{ <[Element]>* \}
  <rule: Comment>   // .*? \s{2,}        #/ Syntax Highlighter fix
  <rule: Words>     (?:\b\w+\b) ** \s
}x;

my $string = 'apple{{mango kiwi } guava ; banana; // pear      berry;}';

if ($string =~ $grammar) {
  open my $log, ">", "log.txt";
  print $log Dumper \%/; #/

  print elements($/{Line}{Element});

} else {
  die "Did not match";
}

sub elements {
  my @elements = @{ shift() };
  my $indent = shift || 0;
  my $output;

  foreach my $element (@elements) {
    $output .= "\t" x $indent;

    foreach my $key (keys %$element) {
      if ($key eq 'Words') {
        $output .= $element->{$key} . "\n";
      } elsif ($key eq 'Block') {
        $output .= "{\n" . elements($element->{$key}->{Element}, $indent + 1) . ("\t" x $indent) . "}\n";
      } elsif ($key eq 'Comment') {
        $output .= $element->{$key} . "\n";
      } elsif ($key eq 'Command') {
        $output .= join(" ", @{ $element->{$key}->{Words} }) . ";\n";
      } elsif ($key eq 'Element') {
        $output .= elements($element->{$key}, $indent + 1);
      }
    }
  }

  return $output;
}

Edit 3: In light of the comments from the OP, I have adapted the above example to allow for multiple words on the same line, as of right now those words can only be separated by one space. I also made comments match anything that starts in // and ends in two or more spaces. Also since I was making changes, and since I believe this to be a code pretty-printer, I added tabbing to the block formatter. If this isn’t desired it should be easy enough to strip away. Go now and learn Regexp::Grammars and make it fit your specific case. (I know I should have made the OP do even this change, but I am enjoying learning it as well)

Edit 4: One more thing, if in fact you are trying to recover useful code from serialized to a single line code, your only real problem is extracting the line comments and separating them from the useful code (assuming you are using a whitespace ignoring language which it looks as though you are). If that is the case, then perhaps try this variation on my original code:

#!/usr/bin/perl
use strict;
use warnings;

my $string = 'apple{{mango } guava ; banana; // pear      berry;}';
my $new_string = join("\n", split(/((?:\/\/).*?\s{2,})/, $string));

print $new_string . "\n";

whose output is

apple{{mango } guava ; banana; 
// pear      
berry;}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a big continuous text with characters like {, },//,; and white spaces

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply