I have a text file which contains content scraped from webpages. The text file

Question

0

Asked: May 27, 20262026-05-27T05:20:13+00:00 2026-05-27T05:20:13+00:00

I have a text file which contains content scraped from webpages. The text file

0

I have a text file which contains content scraped from webpages. The text file is structured like this:

|NEWLINE| indicates the start of a new line (i.e., a new row in the data)
|NEWTAB| indicates the start of a new field within a line (i.e. a new column in the data)

I need to split the text file into fields and lines and store in an array or some other data structure. Content between |NEWLINE| strings may contain actual new lines (i.e. \n), but these don’t indicate an actual new row in the data.

I started by reading each character in one by one and looking at sets of 8 consecutive characters to see if they contained |NEWTAB|. My method proved to be unreliable and ugly. I am looking for the best practice on this. Would the best method be to read the whole text file in as a single string, and then use a string split on “|NEWLINE|” and then string splits on the resulting strings using “|NEWTAB|”?

Many thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T05:20:13+00:00

I think that the other answers will work too, but my solution is as follows:

FileReader inputStream = null;
StringBuilder builder = new StringBuilder();

try {
    inputStream = new FileReader(args[0]);
    int c;
    char d;

    while ((c = inputStream.read()) != -1) {
        d = (char)c;
        builder.append(d);
    }
} 

finally {
    if (inputStream != null) {
        inputStream.close();
    }
}   

String myString = builder.toString();

String rows[] = myString.split("\\|NEWLINE\\|");

for (String row : rows) {
    String cols[] = row.split("\\|NEWTAB\\|");

    /* do something with cols - e.g., store */
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a text file which contains content scraped from webpages. The text file

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply