I’m trying to use PARSE to turn a CSV line into a Rebol block. Easy enough to write in open code, but as with other questions I am trying to learn what the dialect can do without that.
So if a line says:
"Look, that's ""MR. Fork"" to you!",Hostile Fork,,http://hostilefork.com
Then I want the block:
[{Look, that's "MR. Fork" to you!} {Hostile Fork} none {http://hostilefork.com}]
Issues to notice:
- Embedded quotes in CSV strings are indicated with
"" - Commas can be inside quotes and hence part of the literal, not a column separator
- Adjacent column-separating commas indicate an empty field
- Strings that don’t contain quotes or commas can appear without quotes
- For the moment we can keep things like
http://rebol.comas STRING! instead of LOADing them into types such as URL!
To make it more uniform, the first thing I do is append a comma to the input line. Then I have a column-rule which captures a single column terminated by a comma…which may either be in quotes or not.
I know how many columns there should be due to the header line, so the code then says:
unless parse line compose [(column-count) column-rule] [
print rejoin [{Expected } column-count { columns.}]
]
But I’m a bit stuck on writing column-rule. I need a way in the dialect to express “Once you find a quote, keep skipping quote pairs until you find a quote standing all on its own.” What’s a good way to do that?
As with most parse problems, I try to build a grammar that best describes the elements of the input format.
In this case, we have nouns:
Some verbs:
And the operative nouns:
I suppose I could possibly break it down a little more, but is enough to work with. First, the foundation:
Now the value structure. Quoted values are built up from chunks of valid chars or quotes as we find them:
Note that
delimiteris set toendingat the beginning of each row, then changed tocommaas soon as we pass a value. Thus, an input row is defined as[ending value any [comma value]].All that remains is to define the document structure:
Wrap it up to shield all those words, and you have: