I have a text file which contains content scraped from webpages. The text file is structured like this:
|NEWTAB|lkfalskdjlskjdflsj|NEWTAB|lkjsldkjslkdjf|NEWTAB|sdlfkjsldkjf|NEWLINE|lksjlkjsdl|NEWTAB|lkjlkjlkj|NEWTAB|sdkjlkjsld
|NEWLINE| indicates the start of a new line (i.e., a new row in the data)
|NEWTAB| indicates the start of a new field within a line (i.e. a new column in the data)
I need to split the text file into fields and lines and store in an array or some other data structure. Content between |NEWLINE| strings may contain actual new lines (i.e. \n), but these don’t indicate an actual new row in the data.
I started by reading each character in one by one and looking at sets of 8 consecutive characters to see if they contained |NEWTAB|. My method proved to be unreliable and ugly. I am looking for the best practice on this. Would the best method be to read the whole text file in as a single string, and then use a string split on “|NEWLINE|” and then string splits on the resulting strings using “|NEWTAB|”?
Many thanks!
I think that the other answers will work too, but my solution is as follows: