For an iOS application, I want to parse an HTML file that may contain UNIX style variables for replacement. For example, the HTML may look like:
<html>
<head></head>
<body>
<h1>${title}</h1>
<p>${paragraph1}</p>
<img src="${image}" />
</body>
</html>
I’m trying to create a simple ParseKit grammar that will provide me two callbacks: One for passthrough HTML, and another for the variables it detects. For that, I created the following grammar:
@start = Empty | content*;
content = variable | passThrough;
passThrough = /[^$]+/;
variable = '$' '{' Word closeChar;
openChar = '${';
closeChar = '}';
I’m facing at least two issues with this: for variable I had originally declared it as openChar Word closeChar, but it did not work (I still don’t know why). The second issue (and more important) is that the parser stops when it finds <img src"${image}" /> (i.e. a variable inside a quoted string).
My questions are:
- How can I modify the grammar to make it work as expected?
- Is it better to use a tokenizer? If that’s the case, how should I configure it?
Developer of ParseKit here. I’ll answer both of your questions:
1) You are taking the correct approach, but this is a tricky case. There are several small gotchas, and your Grammar needs to be changed a bit.
I’ve developed a grammar which is working for me:
Then implement these two callbacks in your Assembler:
And then your client/driver code will look something like this:
This will be printed:
2) Yes, I think it would definitely be much better to use the Tokenizer directly for this relatively simple case. Performance will be massively better. Here’s how you might approach the task with the Tokenizer: