I have a big continuous text with characters like {, },//,; and white spaces in between. I want to read this text and write into new line wherever it finds these characters.
Input text is like :
apple{{mango } guava ; banana; // pear berry;}
Expected formatted output data should be as shown in image
apple
{
{
mango
}
guava ;
banana;
// pear
berry;
}
I want to do this in perl.Thanks in advance.
Of course you will have to adapt this for your needs (most notably loop while reading lines), but here is a way to do it that doesn’t (really) rely on regexes. As others have said, this is a starting point, you may adapt to what you need.
This splits the line into an array, splitting on non-word characters but keeps the element. Then greps for non-whitespace characters (removing array elements which contain whitespace). Then joins the array elements with newline characters into one string. From what your specification says you need
//together, I leave that as an exercise to the reader.Edit:
After looking at your request again, it looks like to have a specific but complicated structure that you are trying to parse. To do it correctly you may have to use something more powerful like the
Regexp::Grammarsmodule. It will take some learning, but you can define a very complicated set of parsing instructions to do exactly whatever you need.Edit 2:
Since I have been looking for a reason to learn more about
Regexp::Grammars, I took this opportunity. This is a basic example that I came up with. It prints the parsed data structure to a file named “log.txt”. I know it doesn’t look like the structure you asked for, but it contains all of that information and may be reconstituted however you like. I did so with a recursive function that is basically the opposite of the parser.Edit 3: In light of the comments from the OP, I have adapted the above example to allow for multiple words on the same line, as of right now those words can only be separated by one space. I also made comments match anything that starts in
//and ends in two or more spaces. Also since I was making changes, and since I believe this to be a code pretty-printer, I added tabbing to the block formatter. If this isn’t desired it should be easy enough to strip away. Go now and learnRegexp::Grammarsand make it fit your specific case. (I know I should have made the OP do even this change, but I am enjoying learning it as well)Edit 4: One more thing, if in fact you are trying to recover useful code from serialized to a single line code, your only real problem is extracting the line comments and separating them from the useful code (assuming you are using a whitespace ignoring language which it looks as though you are). If that is the case, then perhaps try this variation on my original code:
whose output is