I am expecting a String from an application that looks like:
john|COL-DELIM|doe|COL-DELIM|55|ROW-DELIM|george|COL-DELIM|jetson|COL-DELIM|90|ROW-DELIM|
I want to do two things:
1) Verify the string ‘looks’ correct (i.e. does it match a regex)
2) Pull out each ‘row’, then be able to parse each row
The values in between the delimiters (|COL-DELIM| and |ROW-DELIM|) can be any value (not just strings, numbers, whatever).
((.)(\|COL-DELIM\|)(.)(\|COL-DELIM\|)(.*)(\|ROW-DELIM\|))+
Naturally that doesn’t work b/c of the (.*) things…any suggestions?
People don’t seem to get the fact that they don’t have to use REs (or SQL, but that’s another issue 🙂 for every task, especially those with procedural code is cleaner.
If you’re limiting yourself to using REs, I think that’s a lack of vision.
I would simply process the string, token by token, where a token is one of:
Start with an empty column list, then extract (using indexOf/substring stuff) up to the first next row/column delimiter, adding that text to the column list.
If the delimiter is column, keep going.
If the delimiter is row, check the number of columns and process the list as required.
If there’s no final row delimiter and the column list is non-empty, then the format was invalid.
Sorry if you were really after an RE method but I don’t believe it’s required (or even desirable) here.
Pseudo-code (only a first cut, may be slightly buggy) follows:
You could easily add code to check the correct number of columns in this code, or in
processColumns(), if that was your desire.