I am trying to parse a line in a mmCIF Protein file into separate tokens using Excel 2000/2003. Worst case it COULD look something like this:
token1 token2 "token's 1a',1b'" 'token4"5"' 12 23.2 ? . 'token' tok'en to"ken
Which should become the following tokens:
token1
token2
token's 1a',1b' (note: the double quotes have disappeared)
token4"5" (note: the single quotes have disappeared)
12
23.2
?
.
token (note: the single quotes have disappeared)
to'ken
to"ken
I am looking to see if a RegEx is even possible to split this kind of line into tokens?
Nice puzzle. Thanks.
This pattern (aPatt below) gets the tokens separated, but I can’t figure how to remove the outer quotes.
tallpaul() produces:
If you can figure out how to lose the outer quotes, please let us know.
This needs a reference to “Microsoft VBScript Regular Expressions” to work.